cornetto

Cornetto Toolkit

Compiling the Cornetto C programme

Building the Cornetto C programme requires a compiler that supports C99 standard (with X/Open 7 POSIX 2008 extensions), which is widely available. To build:

sudo apt-get install zlib1g-dev   #install zlib development libraries
git clone https://github.com/hasindu2008/cornetto
cd cornetto
make

The commands to zlib development libraries on some popular distributions :

On Debian/Ubuntu : sudo apt-get install zlib1g-dev
On Fedora/CentOS : sudo dnf/yum install zlib-devel
On OS X : brew install zlib

Using helper scripts

Telomere stats

To get the telomere statistics of your assembly (vertebrates with TTAGGG telomere sequence) use the script scripts/telostats.sh. This script uses cornetto subcommands that implements functionality of telomere analysis scripts from the VGP project. If your assembly is asm.fasta:

scripts/telostats.sh asm.fasta

Output:

Example: If you run this on the HG002 Q100 genome, you should see 46 contigs with 2 telomeres at the end.

scripts/telostats.sh hg002v1.0.1.fasta

$ contigs with 2 telo:       46

dotplot

To generate a dotplot of the mapping between your assembly and a reference genome, use the scripts/minidotplot.sh. This script requires minimap2, samtools. If your assembly is ref.fasta and the reference is asm.fasta:

scripts/minidotplot.sh ref.fasta asm.fasta

Output:

Examples:

# dot plot of a hifiasm primary assembly against the chm13
awk '/^S/{print ">"$2;print $3}' asm.p_ctg.gfa > asm.fasta
scripts/minidotplot.sh chm13.fa asm.fasta

# dot plot of a hifiasm primary assembly against a flat assembly of HG002 Q100
samtools faidx hg002v1.0.1.fasta
grep "PATERNAL\|chrEBV\|chrM\|chrX\|chrY" hg002v1.0.1.fasta.gz.fai | cut -f 1 > paternal.txt
samtools faidx hg002v1.0.1.fasta.gz -r paternal.txt -o hg002v1.0.1_pat.fasta
scripts/minidotplot.sh hg002v1.0.1_pat.fasta asm.fasta

# dot plot of hifiasm hap1+hap2 against HG002 Q100 diploid reference
awk '/^S/{print ">"$2;print $3}' asm.hap1.p_ctg.gfa > asm.hap1.fasta
awk '/^S/{print ">"$2;print $3}' asm.hap2.p_ctg.gfa > asm.hap2.fasta
cat asm.hap1.fasta asm.hap2.fasta > asm.hap1+hap2.fasta
scripts/minidotplot.sh hg002v1.0.1.fasta asm.hap1+hap2.fasta

If you want to create a flat

Assembly stats

To get per-chromosome statistics use the script scripts/asmstats.sh. Make sure you have already run scripts/telostats.sh and scripts/minidotplot.sh before running this script. This is because the files generated in those steps are reused by this script.

scripts/asmstats.sh asm.fasta

Output which is printed to the stdout is explained here.

Examples:

# for a primary assembly caled asm.fasta against chm13.fa
scripts/telostats.sh asm.fasta
scripts/minidotplot.sh chm13.fa asm.fasta
scripts/asmstats.sh asm.fasta

# for a primary assembly caled asm.fasta against a flat hg002 we created above
scripts/telostats.sh asm.fasta
scripts/minidotplot.sh hg002v1.0.1_pat.fasta asm.fasta
scripts/asmstats.sh asm.fasta

# for a diploid assembly called asm.hap1+hap2.fasta against hg002v1.0.1.fasta diploid reference
scripts/telostats.sh asm.hap1+hap2.fasta
scripts/minidotplot.sh hg002v1.0.1.fasta asm.hap1+hap2.fasta
scripts/asmstats.sh asm.hap1+hap2.fasta

Using individual commands

Generate a dot plot step by step

  1. First map your assembly to the reference using minimap2.
    minimap2 -t16 --eqx -cx asm5 ref.fasta asm.fasta > asm.paf
    

    ref.fasta and asm.fasta is what was detailed under the minidotplot.sh above. You may want to change the preset to asm10 and tune other minimap2 options.

  2. fix the +/- directions in the paf file we generated in step 1 to match the direction on the reference
    cornetto fixasm asm.fasta asm.paf -r asm.report.tsv -w asm.fix.paf > asm.fix.fasta
    

    Here, the inputs are asm.fasta and asm.paf. asm.report.tsv, asm.fix.paf and asm.fix.fasta are outputs

asm.fix.paf will contain the asm.paf with +/- directions fixed to match those in the reference. asm.fix.fasta will contain the contigs in asm.fasta but with the directions corrected to match the reference. The contigs are also renamed here based on the chromosome for which the majority of a contig mapped to. asm.report.tsvwill contain a report like the example below that tells the chrosome which the majoirty of the cntig mapped, if the direction was swapped and the new name for the contig we assigned.

ptg000001l	chr3	+	chr3_0
ptg000002l	chr13	+	chr13_0
ptg000003l	chr15	+	chr15_0
ptg000004l	chr5	-	chr5_0
  1. dot plot
    cornetto minidot asm.fix.paf -f 2 > asm.eps
    

    You may convert the eps file to pdf to view.

per-chromosome evaluation

cornetto asmstats asm.paf asm.windows.0.4.50kb.ends.bed -r asm.report.tsv

Here, asm.paf and asm.report.tsv is what we generated in the section “Generate a dot plot step by step” above. ` asm.windows.0.4.50kb.ends.bed is an output from scripts/telostats.sh`.

Output which is printed to the stdout is explained here.

Usage of C programme

Our Cornetto C programme contains a number of subtools that are used by the above explained scripts. If you want to use those subtools in your scripts, see the manual page.