Skip to the content.

Datasets directly in S/BLOW5 format

Table of Contents

R10.4.1 5kHz - DNA

NA24385 R10.4.1 LSK114 PromethION (5KHz) 40X coverage

An NA24385 R10.4.1 LSK114 dataset with ~40X coverage sequenced on a PromethION at 5KHz sampling rate is available at the links below:

Description ENA run Data access Direct download link (md5sum)
chr22 reads subset (BLOW5 format)   PGXXSX240041_reads_chr22.blow5 (``)
~19M reads complete PromethION dataset (BLOW5 format) ERR12997168 PGXXSX240041_reads.blow5 (b9e0f4fc49ffe4d1e39dc9c09eccdeac), PGXXSX240041_reads.blow5.idx (18ac205e53552bcb561ea5b3a55cd9b7) *

*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.

NA24385 R10.4.1 LSK114 PromethION (5KHz) 20X coverage

An NA24385 R10.4.1 LSK114 dataset with ~20X coverage sequenced on a PromethION at 5KHz sampling rate is available at the links below:

Description ENA run Data access Direct download link (md5sum)
~20K reads subsubset (BLOW5 format)   PGXXXX230339_reads_20k.blow5 (d4bb9a40eb89647c2bb74b724d63cef4)
~500K reads subset (BLOW5 format)   PGXXXX230339_reads_500k.blow5 (4cfe7a3ab4fbb45f87fd4ddc3c0b6eca)
chr22 reads subset (BLOW5 format)   PGXXXX230339_reads_chr22.blow5 (``)
~12M reads complete PromethION dataset (BLOW5 format) ERR12997167 PGXXXX230339_reads.blow5 (2dab2e0c042b0fb5f9f3794c7c916420), PGXXXX230339_reads.blow5.idx (84a1b5317f0e92f73143070481df8fe3) *

*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.

A few more R10.4.1 5kHz

RNA004 - RNA

UHR RNA004 PromethION direct-RNA data

Universal human reference RNA (300 ng polyA enriched RNA) sequenced on a PromethION is available from the following links:

Description ENA run Data access Direct download link (md5sum)
~20K reads subsubset (BLOW5 format)   PNXRXX240011_reads_20k.blow5 (``)
~500K reads subset (BLOW5 format)   PNXRXX240011_reads_500k.blow5 (``)
~15M reads complete PromethION dataset (BLOW5 format) ERR12997170 PNXRXX240011_reads.blow5 (671be5b88f2b54a9e22ced351493b7a9), PNXRXX240011_reads.blow5.idx (e3ea326d300a22008e2821ce10d17649) *

*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.

A few more RNA004 direct-RNA

R10.4.1 4kHz - DNA

NA24385 R10.4.1 LSK114 PromethION (4KHz)

An NA24385 R10.4.1 LSK114 dataset sequenced on a PromethION is available on SRA and given below are the links:

Description SRA/ENA run Data access Direct download link (md5sum)
~20K reads subsubset (BLOW5 format)   hg2_prom_lsk114_subsubsample.tar (4d338e1cffd6dbf562cc55d9fcca040c)
~500K reads subset (BLOW5 format) SRR23215365 hg2_subsample_slow5.tar (65386e1da1d82b892677ad5614e8d84d)
chr22 reads subset (BLOW5 format)   PGXX22394_reads_chr22.blow5 (``)
~15M reads complete PromethION dataset (BLOW5 format) SRR23215366/ERR11777845 PGXX22394_reads.blow5 (3498b595ac7c79a3d2dce47454095610), PGXX22394_reads.blow5.idx (1e11735c10cf63edc4a7114f010cc472)*

*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.

NA12878 R10.4.1 LSK114 PromethION (4KHz)

An NA12878 R10.4.1 LSK114 dataset sequenced on a PromethION at 4KHz sampling rate is available at the links below:

Description ENA run Data access Direct download link (md5sum)
chr22 reads subset (BLOW5 format)   PGXXHX230142_reads_chr22.blow5
~11M reads complete PromethION dataset (BLOW5 format) ERR11777844 PGXXHX230142_reads.blow5 (24266f6dabb8d679f7f520be6aa22694), PGXXHX230142_reads.blow5.idx (a5659f829b9410616391427b2526b853) *

*This dataset is hosted in the gtgseq AWS bucket granted by the AWS open data sponsorship programme, for which the documentation available under the gtgseq GitHub repository.

More R10.4.1 4kHz datasets

R9.4.1

NA12878 R9.4.1 PromethION

The NA12878 R9.4.1 PromethION dataset sequenced for the SLOW5 paper is available on SRA and links are given below:

Description SRA run Data access Direct download link (md5sum)
~20K reads subsubset - NA12878_prom_subsubsample.tar.gz (f64074151d25d6e35c73f668d4146032)
~500K reads subset SRR22186403 subsample_slow5.tar (6cdbe02c3844960bb13cf94b9c3173bb)
~9M reads complete PromethION dataset SRR22186402 na12878_prom_merged.blow5 (7e1a5900aff10e2cf1b97b8d3c6ecd1e), na12878_prom_merged.blow5.idx (a78919e8ac8639788942dbc3f1a2451a)

MinION R9.4.1 selective sequencing datasets

MinION datsets sequenced with readfish selective sequencing for Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing are available on SRA.

Converted R9.4.1 public datasets

Following public datasets from others have been converted to BLOW5 format. Relatively smaller datasets (hundreds of GBs) are directly available for download. Larger datasets (terabytes) have been uploaded to SRA and are available for cloud delivery. Alternatively, these converted BLOW5 files are currently stored locally in a archive storage at Garvan Institute, if anyone is interested contact.

  1. SP1 SARS-CoV-2 dataset:
  2. Some of the Zymo Mock community data:
  3. All raw nanopore data from Telomere-to-telomere consortium CHM13 project
    • BLOW5 files available from SRR23371619. file name: CHM13_T2T_ONT_blow5.tar (md5sum: 04f9d1c6ea2d11ccfc131c8244f059d3).
  4. All nanopore-wgs-consortium datasets:
    • BLOW5 files for the DNA dataset available from SRR23513620. filename: na12878_DNA_blow5.tar (md5sum: 2d02a7706d00572dcd9fcfa96e0357f4)
    • BLOW5 files for the direct-RNA dataset available from SRR23513624. filename: na12878_directRNA_blow5.tar (md5sum: 282e305f2b6a72d28980a8d5c803d54e. Also available for direct download from na12878_rna_merged.blow5 (md5sum: 36bc164e9d885838245073f6cd2ecd79), na12878_rna_merged.blow5.idx (md5sum: 82f96208ac2f42574abe0cf5a3954602)
    • BLOW5 files for the cDNA-RNA dataset available from SRR23513622. filename: na12878_cDNA_blow5.tar (md5sum: cba2ce651d8c33528e594a9e45ff6515)
  5. All RNA datasets from the Singapore Nanopore-Expression Project (SG-NEx) are available in BLOW5 format in the sg-nex-data-blow5 AWS S3 bucket. We highly acknowledge Jonathan Göke, Chen Ying for being open and hosting BLOW5 files through the AWS Open Data. Please visit SG-NEx_blow5_tutorial on how you could use/analyse this data.