This programme loads the whole depth file to memory, thus would need tens of gigabytes of RAM. It is not memory-optimized because the assembly process already requires several hundred gigabytes of RAM. Therefore, the user is expected to have access to a computer with a large amount of RAM.
Options:
-q FILE
: depth file with high mapq read coverage-w INT
: window size [default: 2500]-i INT
: window increment [default: 50]-L FLOAT
: low coverage threshold factor [default: 0.4]-H FLOAT
: high coverage threshold factor [default: 2.5]-Q FLOAT
: mapq low coverage threshold factor [default: 0.4]-m INT
: minimum contig length [default: 1000000]-e INT
: edge length to ignore [default: 100000]-h
: help--verbose INT
: verbosity level [default: 4]--version
: print versionCornetto noboringbits prints coordinate windows that meet any of the following:
Example usage:
./cornetto noboringbits test/cov-total.bg -q test/cov-mq20.bg > noboringbits.txt
cornetto bigenough [options] <assembly.bed> <boring.bed>
For each contig, if the total length of the regions listed in
Options:
-r FILE
: also output in readfish format to FILE-T INT
: percentage threshold to consider as sufficient boring bits on a contig [default: 50]This programme processes a FASTA file and a PAF alignment file to fix the direction of contigs based on the total base length being more positive or negative. It outputs the direction fixed FASTA to stdout
and logs missing sequences to stderr
.
Input:
<assembly.fa>
: Input FASTA file containing the assembly to be fixed for contig direction.<asm_to_ref.paf>
: Input PAF file containing alignments of the assembly to a reference.Output:
stdout
.Options:
-m FILE
: write missing contig names to FILE-r FILE
: write report to FILE-w FILE
: write fixed PAF to FILEAlgorithm:
stdout
.Example usage:
./cornetto fixasm assembly.fa asm2ref.paf -m missing_sequences.txt -r report.tsv -w fixed.paf > fixed_contigs.fasta
This subcommand generates a dot plot from a PAF file. From https://github.com/lh3/miniasm.
Options:
-m INT
: minimum match length [default: 100]-i FLOAT
: minimum identity [default: 0.1]-s INT
: minimum span [default: 1000]-w INT
: image width [default: 600]-f INT
: font size [default: 11]-L
: do not print labels-D
: do not align hits to the diagonalExample usage:
cornetto minidot -m 500 -i 0.9 -s 2000 -w 800 input.paf > output.eps
This subprogram calculate per-chromosome assembly evaluation statistics. Output is detailed here.
Options:
-r FILE
: report file generated from fixasm-s STR
: use the sort order specified by STR when printing the chromosome report. STR can be human1 for haploid human chromosome names, human2 for diploid human chromosome names or a fasta file to read the chromosome order from.Example usage:
cornetto asmstats asm2ref.paf telomere.bed -r fixasm.report.tsv
This subcommand analyses telomere windows in a genome assembly.
Options:
<input_file>
: Input file containing telomere regions.<identity>
: Identity percentage (e.g., 99.9).<threshold>
: Threshold for telomere detection.Example usage:
cornetto telowin input.telomere 99.9 0.4 > output.windows
This subcommand identifies telomere breaks in a genome assembly.
Inputs:
<lens_file>
: File containing contig lengths.<sdust_file>
: File containing low-complexity regions.<telomere_file>
: File containing telomere regions.Example usage:
cornetto telobreaks assembly.lens assembly.sdust assembly.telomere > output.breaks
This subcommand identifies telomere sequences in a FASTA file.
Options:
<input.fasta>
: Input FASTA file.[sequence]
: Optional sequence to search for (default: TTAGGG
).Example usage:
cornetto telofind input.fasta > output.telomere
This subcommand identifies low-complexity regions in a FASTA file using the symmetric DUST algorithm. From https://github.com/lh3/sdust.
Options:
-w INT
: window size [default: 64]-t INT
: threshold [default: 20]Example usage:
cornetto sdust -w 64 -t 20 input.fa > output.sdust
create a bed file with assembly contig lengths
Example usage:
cornetto asmbed <assembly.fasta>
extract reads equal to larger than a threshold from a fastq
Options:
-m INT
: min length [30000]Example usage:
cornetto seq <reads.fastq>