Data used in the preprint/publication
FASTQ and BLOW5
The ENA project PRJEB86853 contains raw data with file name prefixes as in this tsv file here. In the tsv table:
- For PacBio HiFi data there should be one fastq.gz file.
- For ONT data there should ba a fastq.gz containing basecalled reads and a tar.gz containing raw signal BLOW5 files.
- Note that FASTQ files contain what was given to the assembler, that are already filtered (e.g., pass reads, qscore cut off).
- FASTQ for ONT duplex samples only contain duplex reads.
- For cornetto rounds for ONT simplex, FASTQ files contain reads >30kbase.
- BLOW5 files contain everything that came out of the sequencing run without any filtering.
- Those with T2TC inside brackets are where we used publicly available data from T2T Consortium’s human pangenomics AWS bucket. The links we used are:
- *hg002-Cornetto-x.1 is made of fastq0+fastq1; hg002-Cornetto-x.2 is made of fastq0+fastq1+fastq2; and so on.
FASTA assembly
HG002
The files in the table here are found inside the cornetto-hg002-asm directory created when you download and extract the cornetto-hg002-asm.tar.gz file from https://doi.org/10.5061/dryad.kkwh70sfr
Animals
The files in the table here are found inside the cornetto-animal-asm directory created when you download and extract the cornetto-animal-asm.tar.gz file from https://doi.org/10.5061/dryad.kkwh70sfr