Cornetto noboringbits prints coordinate windows that meet any of the following:
This programme loads the whole depth file to memory, thus would need tens of gigabytes of RAM. It is not memory-optimised because the assembly process already requires several hundred gigabytes of RAM. Therefore, the user is expected to have access to a computer with a large amount of RAM.
Options:
-q FILE
: depth file with high mapq read coverage-w INT
: window size [default: 2500]-i INT
: window increment [default: 50]-L FLOAT
: low coverage threshold factor [default: 0.4]-H FLOAT
: high coverage threshold factor [default: 2.5]-Q FLOAT
: mapq low coverage threshold factor [default: 0.4]-m INT
: minimum contig length [default: 1000000]-e INT
: edge length to ignore [default: 100000]-h
: help--verbose INT
: verbosity level [default: 4]--version
: print versionExample usage:
./cornetto noboringbits test/cov-total.bg -q test/cov-mq20.bg > noboringbits.txt
cornetto bigenough [options] <assembly.bed> <boring.bed>
For each contig, if the total length of the regions listed in
Options:
-r FILE
: also output in readfish format to FILE-T INT
: percentage threshold to consider as sufficient boring bits on a contig [default: 50]This program processes a FASTA file and a PAF alignment file to fix the direction of contigs based on the total base length being more positive or negative. It outputs the direction fixed contigs in FASTA format to stdout
.
Input:
<assembly.fa>
: Input FASTA file containing the assembly to be fixed for contig direction.<asm_to_ref.paf>
: Input PAF file containing alignments of the assembly to a reference.Output:
stdout
.Options:
-m FILE
: write missing contig names to FILE-r FILE
: write report to FILE-w FILE
: write fixed PAF to FILEAlgorithm:
stdout
.Example usage:
./cornetto fixasm assembly.fa asm2ref.paf -m missing_sequences.txt -r report.tsv -w fixed.paf > fixed_contigs.fasta
This subcommand generates a dot plot from a PAF file. From https://github.com/lh3/miniasm.
Options:
-m INT
: minimum match length [default: 100]-i FLOAT
: minimum identity [default: 0.1]-s INT
: minimum span [default: 1000]-w INT
: image width [default: 600]-f INT
: font size [default: 11]-L
: do not print labels-D
: do not align hits to the diagonalExample usage:
cornetto minidot -m 500 -i 0.9 -s 2000 -w 800 input.paf > output.eps
This subprogram calculate per-chromosome assembly evaluation statistics. Output is detailed here.
Options:
-r FILE
: report file generated from fixasm-s STR
: use the sort order specified by STR when printing the chromosome report. STR can be human1 for haploid human chromosome names, human2 for diploid human chromosome names or a fasta file to read the chromosome order from.Example usage:
cornetto asmstats asm2ref.paf telomere.bed -r fixasm.report.tsv
This subprogram prints a table that can be directly used to get an Nx or NGx plot. First column is the % cumulative contig lengths. The second column is the contig length.
Options:
-g STR
: genome size (e.g. 3.1G). If specified becomes NGx. If unspecified, will use total contig length, thus nx.Example usage:
This subcommand analyses telomere windows in a genome assembly.
Inputs:
<input_file>
: Input file containing telomere regions.<identity>
: Identity percentage (e.g., 99.9).<threshold>
: Threshold for telomere detection.Example usage:
cornetto telowin input.telomere 99.9 0.4 > output.windows
This subcommand identifies telomere breaks in a genome assembly.
Inputs:
<lens_file>
: File containing contig lengths.<sdust_file>
: File containing low-complexity regions.<telomere_file>
: File containing telomere regions.Example usage:
cornetto telobreaks assembly.lens assembly.sdust assembly.telomere > output.breaks
This subcommand identifies telomere sequences in a FASTA file.
Inputs:
<input.fasta>
: Input FASTA file.[sequence]
: Optional sequence to search for (default: TTAGGG
).Example usage:
cornetto telofind input.fasta > output.telomere
This subcommand identifies low-complexity regions in a FASTA file using the symmetric DUST algorithm. From https://github.com/lh3/sdust.
Options:
-w INT
: window size [default: 64]-t INT
: threshold [default: 20]Example usage:
cornetto sdust -w 64 -t 20 input.fa > output.sdust
create a bed file with assembly contig lengths
Example usage:
cornetto asmbed assembly.fasta
extract reads equal to longer than a threshold from a fastq
Options:
-m INT
: min length [30000]Example usage:
cornetto seq reads.fastq