Skip to the content.

S/BLOW5 Design Goals

S/BLOW5 is a file format designed for the bioinformatics research community and hovers around three major design principles:

  1. Simplicity and usability
  2. Reading performance for common analysis workloads
  3. Efficient file size

Simplicity and usability

We believe that simplicity and usability translate to improved scientist efficiency and thus are as important as computational performance. This also makes BLOW5 an ideal choice for archiving as no adhoc changes will be done and even after 5 or 10 years it can be compiled on a system as long as a c compiler is available. In a general-purpose format, especially if dependent on multiple heavyweight libraries, a deprecation of a single dependency down the chain by that time would cause a nightmare to compile an earlier version.

Reading performance for common analysis workloads

Efficient file size

Note that simplicity has been sometimes favoured over performance. For instance, slow5lib uses the well-established and widely available system calls (read, write, pread64) rather than using the latest asynchronous/iuring interfaces. This is because compatibility is more important, we believe. Also, we do not use the latest compiler features despite performance gains, for the sake of compatibility and portability.


In addition to the above, S/BLOW5 also features extendability and flexibility.

Extendibility

Flexibility

Also, with the demise of Moore’s law, the computer architectures are moving toward domain-specific systems (See the Turing lecture from Turing award Winners in 2018 Hennessy and Patterson who are the pioneers in computer architecture). So, we believe that having a domain-specific format rather than a general-purpose format will be the first step necessary for that.

Limitations

As a final note, S/BLOW5 is a community developed format and thus any comments, suggestions, contributions, critiques and questions are welcome.

Last, but not least, see Heng Li’s comment: slow5_lh3