Datasets directly in S/BLOW5 format
Table of Contents
- R10.4.1 5kHz - DNA
- RNA004 - RNA
- R10.4.1 4kHz - DNA
- R9.4.1
R10.4.1 5kHz - DNA
NA24385 R10.4.1 LSK114 PromethION (5KHz) 40X coverage
An NA24385 R10.4.1 LSK114 dataset with ~40X coverage sequenced on a PromethION at 5KHz sampling rate is available at the links below:
Description | ENA run Data access | Direct download link (md5sum) |
---|---|---|
chr22 reads subset (BLOW5 format) | PGXXSX240041_reads_chr22.blow5 (``) | |
~19M reads complete PromethION dataset (BLOW5 format) | ERR12997168 | PGXXSX240041_reads.blow5 (b9e0f4fc49ffe4d1e39dc9c09eccdeac ), PGXXSX240041_reads.blow5.idx (18ac205e53552bcb561ea5b3a55cd9b7 ) * |
*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.
NA24385 R10.4.1 LSK114 PromethION (5KHz) 20X coverage
An NA24385 R10.4.1 LSK114 dataset with ~20X coverage sequenced on a PromethION at 5KHz sampling rate is available at the links below:
Description | ENA run Data access | Direct download link (md5sum) |
---|---|---|
~20K reads subsubset (BLOW5 format) | PGXXXX230339_reads_20k.blow5 (d4bb9a40eb89647c2bb74b724d63cef4 ) |
|
~500K reads subset (BLOW5 format) | PGXXXX230339_reads_500k.blow5 (4cfe7a3ab4fbb45f87fd4ddc3c0b6eca ) |
|
chr22 reads subset (BLOW5 format) | PGXXXX230339_reads_chr22.blow5 (``) | |
~12M reads complete PromethION dataset (BLOW5 format) | ERR12997167 | PGXXXX230339_reads.blow5 (2dab2e0c042b0fb5f9f3794c7c916420 ), PGXXXX230339_reads.blow5.idx (84a1b5317f0e92f73143070481df8fe3 ) * |
*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.
A few more R10.4.1 5kHz
- An NA24385 R10.4.1 LSK114 dataset sequenced on a MinION at 5KHz is available through ENA at ERR12997169
- An NA24385 R10.4.1 duplex dataset sequenced on a PromethION at 5KHz is available through ENA at ERR13475640
RNA004 - RNA
UHR RNA004 PromethION direct-RNA data
Universal human reference RNA (300 ng polyA enriched RNA) sequenced on a PromethION is available from the following links:
Description | ENA run Data access | Direct download link (md5sum) |
---|---|---|
~20K reads subsubset (BLOW5 format) | PNXRXX240011_reads_20k.blow5 (``) | |
~500K reads subset (BLOW5 format) | PNXRXX240011_reads_500k.blow5 (``) | |
~15M reads complete PromethION dataset (BLOW5 format) | ERR12997170 | PNXRXX240011_reads.blow5 (671be5b88f2b54a9e22ced351493b7a9 ), PNXRXX240011_reads.blow5.idx (e3ea326d300a22008e2821ce10d17649 ) * |
*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.
A few more RNA004 direct-RNA
- Another PromethION universal human reference RNA sample (1.5 ug of total RNA, done without polyA enrichment) is available through ENA at ERR12997171
- Universal human reference RNA (294 ng polyA enriched RNA) with 2% SIRV spiked in (6 ng) sample sequenced on a MinION is available through ENA at ERR12997172
R10.4.1 4kHz - DNA
NA24385 R10.4.1 LSK114 PromethION (4KHz)
An NA24385 R10.4.1 LSK114 dataset sequenced on a PromethION is available on SRA and given below are the links:
Description | SRA/ENA run Data access | Direct download link (md5sum) |
---|---|---|
~20K reads subsubset (BLOW5 format) | hg2_prom_lsk114_subsubsample.tar (4d338e1cffd6dbf562cc55d9fcca040c ) |
|
~500K reads subset (BLOW5 format) | SRR23215365 | hg2_subsample_slow5.tar (65386e1da1d82b892677ad5614e8d84d ) |
chr22 reads subset (BLOW5 format) | PGXX22394_reads_chr22.blow5 (``) | |
~15M reads complete PromethION dataset (BLOW5 format) | SRR23215366/ERR11777845 | PGXX22394_reads.blow5 (3498b595ac7c79a3d2dce47454095610 ), PGXX22394_reads.blow5.idx (1e11735c10cf63edc4a7114f010cc472 )* |
*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.
NA12878 R10.4.1 LSK114 PromethION (4KHz)
An NA12878 R10.4.1 LSK114 dataset sequenced on a PromethION at 4KHz sampling rate is available at the links below:
Description | ENA run Data access | Direct download link (md5sum) |
---|---|---|
chr22 reads subset (BLOW5 format) | PGXXHX230142_reads_chr22.blow5 | |
~11M reads complete PromethION dataset (BLOW5 format) | ERR11777844 | PGXXHX230142_reads.blow5 (24266f6dabb8d679f7f520be6aa22694 ), PGXXHX230142_reads.blow5.idx (a5659f829b9410616391427b2526b853 ) * |
*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.
More R10.4.1 4kHz datasets
- human methylated and non-methylated (WGA) DNA datasets from zymo dna methylation standards (D5013) are available on ENA under PRJEB64592. The BLOW5 files are in PGXX22562_methylated.tar.gz and PGXX22563_nonmethylated.tar.gz.
R9.4.1
NA12878 R9.4.1 PromethION
The NA12878 R9.4.1 PromethION dataset sequenced for the SLOW5 paper is available on SRA and links are given below:
Description | SRA run Data access | Direct download link (md5sum) |
---|---|---|
~20K reads subsubset | - | NA12878_prom_subsubsample.tar.gz (f64074151d25d6e35c73f668d4146032 ) |
~500K reads subset | SRR22186403 | subsample_slow5.tar (6cdbe02c3844960bb13cf94b9c3173bb ) |
~9M reads complete PromethION dataset | SRR22186402 | na12878_prom_merged.blow5 (7e1a5900aff10e2cf1b97b8d3c6ecd1e ), na12878_prom_merged.blow5.idx (a78919e8ac8639788942dbc3f1a2451a ) |
MinION R9.4.1 selective sequencing datasets
MinION datsets sequenced with readfish selective sequencing for Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing are available on SRA.
- tar files without “_reads” at the end (e.g., GBXM047265.tar) are BLOW5 data
- tar files with _reads at the end (e.g., GBXM047265_reads.tar) are FAST5 data
Converted R9.4.1 public datasets
Following public datasets from others have been converted to BLOW5 format. Relatively smaller datasets (hundreds of GBs) are directly available for download. Larger datasets (terabytes) have been uploaded to SRA and are available for cloud delivery. Alternatively, these converted BLOW5 files are currently stored locally in a archive storage at Garvan Institute, if anyone is interested contact.
- SP1 SARS-CoV-2 dataset:
- SP1-raw-mapped.blow5 (md5sum:
d87c60f70bf8646ee56bcee2795e7535
) - SP1-raw-mapped.blow5.idx (md5sum:
c79ef9280be63fad7c07e4352402ce7a
)
- SP1-raw-mapped.blow5 (md5sum:
- Some of the Zymo Mock community data:
- Zymo-GridION-EVEN-BB-SN.blow5 (md5sum:
d7c894164aef398907adc6c034dd3049
) - Zymo-GridION-EVEN-BB-SN.blow5.idx (md5sum:
d7d5feae1107c6d4517ebb416dc02683
)
- Zymo-GridION-EVEN-BB-SN.blow5 (md5sum:
- All raw nanopore data from Telomere-to-telomere consortium CHM13 project
- BLOW5 files available from SRR23371619. file name:
CHM13_T2T_ONT_blow5.tar
(md5sum:04f9d1c6ea2d11ccfc131c8244f059d3
).
- BLOW5 files available from SRR23371619. file name:
- All nanopore-wgs-consortium datasets:
- BLOW5 files for the DNA dataset available from SRR23513620. filename:
na12878_DNA_blow5.tar
(md5sum:2d02a7706d00572dcd9fcfa96e0357f4
) - BLOW5 files for the direct-RNA dataset available from SRR23513624. filename:
na12878_directRNA_blow5.tar
(md5sum:282e305f2b6a72d28980a8d5c803d54e
. Also available for direct download from na12878_rna_merged.blow5 (md5sum:36bc164e9d885838245073f6cd2ecd79
), na12878_rna_merged.blow5.idx (md5sum:82f96208ac2f42574abe0cf5a3954602
) - BLOW5 files for the cDNA-RNA dataset available from SRR23513622. filename:
na12878_cDNA_blow5.tar
(md5sum:cba2ce651d8c33528e594a9e45ff6515
)
- BLOW5 files for the DNA dataset available from SRR23513620. filename:
- All RNA datasets from the Singapore Nanopore-Expression Project (SG-NEx) are available in BLOW5 format in the sg-nex-data-blow5 AWS S3 bucket. We highly acknowledge Jonathan Göke, Chen Ying for being open and hosting BLOW5 files through the AWS Open Data. Please visit SG-NEx_blow5_tutorial on how you could use/analyse this data.