cornetto

Usage of the C programme

create panels

noboringbits

This programme loads the whole depth file to memory, thus would need tens of gigabytes of RAM. It is not memory-optimized because the assembly process already requires several hundred gigabytes of RAM. Therefore, the user is expected to have access to a computer with a large amount of RAM.

Options:

Cornetto noboringbits prints coordinate windows that meet any of the following:

  1. contigs < 1Mbase in size
  2. 100kbase edge regions at each
  3. Windows that meet any of the following criteria:
    • windows with low coverage: < [0.4]x genome average
    • windows with high coverage: > [2.5]x genome average
    • windows with low mappability: mean MAPQ 20 coverage for window is < [0.4]x mean coverage for the window

Example usage:

./cornetto noboringbits test/cov-total.bg -q test/cov-mq20.bg > noboringbits.txt

bigenough

cornetto bigenough [options] <assembly.bed> <boring.bed>

For each contig, if the total length of the regions listed in covers more than T% of the contig's total length in , include all of that contig’s regions from in the output.

Options:


fixasm

This programme processes a FASTA file and a PAF alignment file to fix the direction of contigs based on the total base length being more positive or negative. It outputs the direction fixed FASTA to stdout and logs missing sequences to stderr.

Input:

Output:

Options:

Algorithm:

  1. Parse the PAF file to calculate the total positive and negative alignment lengths for each contig.
  2. Reverse complement contigs with a higher negative alignment length.
  3. Write the fixed contigs to stdout.
  4. Log sequences missing from the PAF file if requested.
  5. Write the fixed PAF file if requested.

Example usage:

./cornetto fixasm assembly.fa asm2ref.paf -m missing_sequences.txt -r report.tsv -w fixed.paf > fixed_contigs.fasta

minidot

This subcommand generates a dot plot from a PAF file. From https://github.com/lh3/miniasm.

Options:

Example usage:

cornetto minidot -m 500 -i 0.9 -s 2000 -w 800 input.paf > output.eps

Eval

asmstats

This subprogram calculate per-chromosome assembly evaluation statistics. Output is detailed here.

Options:

Example usage:

cornetto asmstats asm2ref.paf telomere.bed -r fixasm.report.tsv

Telemere

telowin

This subcommand analyses telomere windows in a genome assembly.

Options:

Example usage:

cornetto telowin input.telomere 99.9 0.4 > output.windows

telobreaks

This subcommand identifies telomere breaks in a genome assembly.

Inputs:

Example usage:

cornetto telobreaks assembly.lens assembly.sdust assembly.telomere > output.breaks

telofind

This subcommand identifies telomere sequences in a FASTA file.

Options:

Example usage:

cornetto telofind input.fasta > output.telomere

sdust

This subcommand identifies low-complexity regions in a FASTA file using the symmetric DUST algorithm. From https://github.com/lh3/sdust.

Options:

Example usage:

cornetto sdust -w 64 -t 20 input.fa > output.sdust

Misc

fa2bed

create a bed file with assembly contig lengths

Example usage:

cornetto asmbed <assembly.fasta>

seq

extract reads equal to larger than a threshold from a fastq

Options:

Example usage:

cornetto seq <reads.fastq>