cornetto

Cornetto Toolkit

Compiling the Cornetto C programme

Building the Cornetto C programme requires a compiler that supports C99 standard (with X/Open 7 POSIX 2008 extensions), which is widely available. To build:

sudo apt-get install zlib1g-dev   #install zlib development libraries
git clone https://github.com/hasindu2008/cornetto
cd cornetto
make

The commands to zlib development libraries on some popular distributions :

On Debian/Ubuntu : sudo apt-get install zlib1g-dev
On Fedora/CentOS : sudo dnf/yum install zlib-devel
On OS X : brew install zlib

Using helper scripts

Telomere stats

To get the telomere statistics of your assembly (vertebrates with TTAGGG telomere sequence) use the script scripts/telostats.sh. This script uses cornetto subcommands that implements functionality of telomere analysis scripts from the VGP project. If your assembly is asm.fasta:

scripts/telostats.sh asm.fasta

Output:

Example: If you run this on the HG002 Q100 genome, you should see 46 contigs with 2 telomeres at the end.

scripts/telostats.sh hg002v1.0.1.fasta

$ contigs with 2 telo:       46

dotplot

To generate a dotplot of the mapping between your assembly and a reference genome, use the scripts/minidotplot.sh. This script requires minimap2, samtools. If your assembly is ref.fasta and the reference is asm.fasta:

scripts/minidotplot.sh ref.fasta asm.fasta

Output:

Examples:

# dot plot of a hifiasm primary assembly against the chm13 haploid cell-line reference`
awk '/^S/{print ">"$2;print $3}' asm.p_ctg.gfa > asm.fasta
scripts/minidotplot.sh chm13.fa asm.fasta

# dot plot of a hifiasm primary assembly against a "haploid" version of the HG002 Q100 reference
samtools faidx hg002v1.0.1.fasta
grep "PATERNAL\|chrEBV\|chrM\|chrX\|chrY" hg002v1.0.1.fasta.gz.fai | cut -f 1 > paternal.txt
samtools faidx hg002v1.0.1.fasta.gz -r paternal.txt -o hg002v1.0.1_pat.fasta
scripts/minidotplot.sh hg002v1.0.1_pat.fasta asm.fasta

# dot plot of hifiasm hap1+hap2 against HG002 Q100 diploid reference
awk '/^S/{print ">"$2;print $3}' asm.hap1.p_ctg.gfa > asm.hap1.fasta
awk '/^S/{print ">"$2;print $3}' asm.hap2.p_ctg.gfa > asm.hap2.fasta
cat asm.hap1.fasta asm.hap2.fasta > asm.hap1+hap2.fasta
scripts/minidotplot.sh hg002v1.0.1.fasta asm.hap1+hap2.fasta

# dot plot of a hifiasm hap1 against the chm13 haploid cell-line reference
scripts/minidotplot.sh hg002v1.0.1.fasta asm.hap1.fasta

Assembly stats

To get per-chromosome statistics use the script scripts/asmstats.sh. Make sure you have already run scripts/telostats.sh and scripts/minidotplot.sh before running this script. This is because the files generated in those steps are reused by this script.

scripts/asmstats.sh asm.fasta

Output which is printed to the stdout is explained here.

Examples:

# for a primary assembly called asm.fasta against chm13.fa
scripts/telostats.sh asm.fasta
scripts/minidotplot.sh chm13.fa asm.fasta
scripts/asmstats.sh asm.fasta

# for a primary assembly called asm.fasta against a haploid hg002 we created before
scripts/telostats.sh asm.fasta
scripts/minidotplot.sh hg002v1.0.1_pat.fasta asm.fasta
scripts/asmstats.sh asm.fasta

# for a diploid assembly called asm.hap1+hap2.fasta against hg002v1.0.1.fasta diploid reference
scripts/telostats.sh asm.hap1+hap2.fasta
scripts/minidotplot.sh hg002v1.0.1.fasta asm.hap1+hap2.fasta
scripts/asmstats.sh asm.hap1+hap2.fasta

# for haplotype 1 assembly called asm.hap1.fasta against the chm13
scripts/telostats.sh asm.hap1.fasta
scripts/minidotplot.sh chm13.fa asm.hap1.fasta
scripts/asmstats.sh asm.hap1.fasta

Using individual commands

What the aforementioned scripts perform, can also be done manually through individual commands.

Generate a dot plot step by step

  1. First map your assembly to the reference using minimap2.
    minimap2 --eqx -cx asm5 ref.fasta asm.fasta > asm.paf
    

    ref.fasta and asm.fasta are what we detailed under the minidotplot.sh section above. You may want to change the pre-set to asm10 and tune other minimap2 options.

  2. fix the +/- directions in the paf file we generated in step 1 to match the direction on the reference
    cornetto fixasm asm.fasta asm.paf -r asm.report.tsv -w asm.fix.paf > asm.fix.fasta
    

    Here, the inputs are asm.fasta and asm.paf. asm.report.tsv, asm.fix.paf and asm.fix.fasta are outputs:

  1. dot plot
    cornetto minidot asm.fix.paf -f 2 > asm.eps
    

    You may convert the eps file to pdf to view.

per-chromosome evaluation

cornetto asmstats asm.paf asm.windows.0.4.50kb.ends.bed -r asm.report.tsv

Here, asm.paf and asm.report.tsv is what we generated in the section “Generate a dot plot step by step” above. ` asm.windows.0.4.50kb.ends.bed is an output from scripts/telostats.sh`.

Output which is printed to the stdout is explained here.

Usage of C programme

Our Cornetto C programme contains a number of subtools in addition to the ones explained above. For more details of each and every command, see the manual page.