Summary of SLOW5 ASCII format

This is just a summary of the latest version of SLOW5 ASCII file format (.slow5). For the full specification and information on SLOW5 binary (called BLOW5) format, refer to the PDF links here.

A SLOW5 ASCII file is a plain text file that uses the American Standard Code for Information Interchange (ASCII) encoding (locale: C/POSIX, code set: US-ASCII). The file extension is .slow5. A SLOW5 file contains a header followed by the sequencing data. An example structure of a SLOW5 ASCII file with a single read group is and an example structure of a SLOW5 ASCII with multiple read groups - i.e., multiple sequencing runs - is provided below. The column/row borders, spacing and cell colours are added to increase the readability. The actual format uses tabs (‘\t’) and newlines (‘\n’) as delimiters

Example of a SLOW5 ASCII file with a single read group:

_{#slow5_version}	_1.0.0
_{#num_read_groups}	₁
_{@asic_id}	_{0004A30B00232BEC}
_{@exp_start_time}	_{2020-01-01T00:00:00Z}
_{@flow_cell_id}	_FAH00000
_{@run_id}	_855cdb
_…	_…
_#char*	_{uint32_t}	_double	_double	_double	_double	_{uint64_t}	_{int16_t*}	_…
_{#read_id}	_{read_group}	_digitisation	_offset	_range	_{sampling_rate}	_{len_raw_signal}	_{raw_signal}	_…
_read0	₀	₈₁₉₂	₆	_1467.6	₄₀₀₀	₁₂₃₄₅₆	_498,492,…	_…
_read1	₀	₈₁₉₂	₅	_1467.6	₄₀₀₀	₂₀₀₀	_491,491,…	_…
_…	_…	_…	_…	_…	_…	_…	_…	_…
_readN	₀	₈₁₉₂	₃	_1467.6	₄₀₀₀	₃₀₀₀	_400,400,…	_…

Example of a SLOW5 ASCII file with multiple read groups:

_{#slow5_version}	_1.0.0
_{#num_read_groups}	₃
_{@asic_id}	_{0004A30B00232BEC}	_{1004A30B00232BEC}	_{2004A30B00232BEC}
_{@exp_start_time}	_{2020-01-01T00:00:00Z}	_{2020-01-01T00:00:00Z}	_{2020-01-01T00:00:00Z}
_{@flow_cell_id}	_FAH00000	_FAH00001	_FAH00002
_{@run_id}	_855cdb	_855cd1	_855cdc
_…	_…	_…	_…
_#char*	_{uint32_t}	_double	_double	_double	_double	_{uint64_t}	_{int16_t*}	_…
_{#read_id}	_{read_group}	_digitisation	_offset	_range	_{sampling_rate}	_{len_raw_signal}	_{raw_signal}	_…
_read-0	₁	₈₁₉₂	₆	_1467.6	₄₀₀₀	₄₀₀₀	_498,492,…	_…
_read-1	₀	₈₁₉₂	₅	_1467.6	₄₀₀₀	₂₀₀₀	_491,491,…	_…
_…	_…	_…	_…	_…	_…	_…	_…	_…
_read-N	₂	₈₁₉₂	₃	_1467.6	₄₀₀₀	₃₀₀₀	_400,400,…	_…

SLOW5 Header

The SLOW5 header stores metadata regarding the experiment. Header lines start with either ‘#’ or ‘@’. The header contains two parts: the global header and the data header.

lines starting with ‘#’ form the global header.

The first line of a SLOW5 ASCII file is a key-value pair that specifies the SLOW5 version. The key is separated from the value using a tab ‘\t’.
The second line specifies the number of read groups in the file. Observe that in the single read group file example (Table 1), the value for num_read_groups is set to 1. In the second example with three read groups (Table 2) the value is set to 3.
The last line of the header is always the field names for the subsequent per-read records.
The second last line of the header specifies the data types of each field for the subsequent per-read records (i.e., for the fields named in the last line of the header). Further information about the fields is provided in the SLOW5 Data section below.

Data header

The header lines that start with ‘@’ form the data header. These header lines contain ONT data attributes that are shared across multiple reads in a sequencing run (read group). For instance, the run_id and the flow_cell_id are common to all the reads in the read group and are therefore stored in the data header.

SLOW5 Data

After the SLOW5 header, the actual data is encoded. Each line contains information about a single read and we refer to this as a record.

Primary fields

These fields are mandatory and must be arranged in the order that they appear below:

Col	Field name	Data type	Description	Example value
1	_{read_id}	_char*	_{A unique identifier for the read.}	_{00592138-f120-4ab5-9916-c5567adb8e29}
2	_{read_group}	_{uint32_t}	_{Read group identifier.}	₀
3	_digitisation	_double	_{Number of quantisation levels in the Analog to Digital Converter (ADC). That is, if the ADC is 12 bit, digitisation is 4096 (2¹²).}	₈₁₉₂
4	_offset	_double	_{The ADC offset error. This value is added when converting the signal to pico ampere.}	₁₀
5	_range	_double	_{The full scale measurement range in pico amperes.}	_1441.389893
6	_{sampling_rate}	_double	_{Sampling frequency of the ADC, i.e., the number of data points collected per second.}	₄₀₀₀
7	_{len_raw_signal}	_{uint64_t}	_{The number of samples in the raw signal (length of the raw_signal vector below).}	₅₉₆₇₆
8	_{raw_signal}	_{int16_t*}	_{The raw signal which are the direct acquisition values from the ADC and are comma separated.}	_{1039,588,588,593,586….}

Primary fields contain all the information required for a typical nanopore signal-level analysis. The raw signal can be converted to pico-ampere using the following equation:

signal_in_pico_ampere = (raw_signal + offset) * range / digitisation

Auxiliary fields

These fields are optional and not bound by any strict order. Following are some common auxiliary data fields in SLOW5 format:

Field name	Data type	Description	Example value
_{channel_number}	_char*	_{The channel number. A flow cell has multiple channels allowing multiple DNA/RNA strands to be sequenced in parallel. For instance, a MinION flow cell has 512 channels and thus can sequence 512 strands in parallel.}	₅₀₄
_{median_before}	_double	_{The estimated median current level immediately preceding the read. In most cases this can be used as an estimate of the open pore level. The open-pore state is when there is no strand inside the pore.}	_{238.78225708007812}
_{read_number}	_{int32_t}	_{A unique number within each channel counted upwards from zero. Note that not all reads generated are “strand” reads, but only strand reads are written to the final fast5 file, so some read numbers may be absent.}	₁₇₉₈₁
_{start_mux}	_{uint8_t}	_{The MUX setting for the channel when the read began. Each channel contains one or more wells. For instance, a MinION flow cell has 4 wells per channel. The wells within a channel are connected to a multiplexer (MUX), a switch that controls which of the four wells in the channel is controlled and read out for sequencing.}	₄
_{start_time}	_{uint64_t}	_{The start time of the read. The unit for start_time is ‘number of signal samples’, so start_time has to be divided by sampling rate (sampling_rate) to get the start time in seconds (i.e. the time since the run was started)}	_335845487

Please cite the following in your publications when using S/BLOW5 file format:

Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4

@article{gamaarachchi2022fast,
  title={Fast nanopore sequencing data analysis with SLOW5},
  author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},
  journal={Nature biotechnology},
  pages={1--4},
  year={2022},
  publisher={Nature Publishing Group}
}

Summary of SLOW5 ASCII format

Specification of the SLOW5 file format

Summary of SLOW5 ASCII format

SLOW5 Header

Global header

Data header

SLOW5 Data

Primary fields

Auxiliary fields