Compression of Nanopore FASTQ Files

Guillermo Dufort y Álvarez, Gadiel Seroussi, Pablo Smircich, José Sotelo, Idoia Ochoa, Álvaro Martín

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The research and development of tools for genomic data compression has focused so far on data generated by second-generation sequencing technologies, while third-generation technologies, such as nanopore technologies, have received little attention in the data compression research community. In this paper, we investigate compression schemes for nanopore FASTQ files. We propose a nanopore quality scores compressor, called DualCtx, which yields significant improvements in compression performance with respect to the state-of-the-art. We also extend DualCtx to a full FASTQ compressor, termed DualFqz, by substituting DualCtx for the quality score compression module in a variant of Fqzcomp. We tested DualFqz and various existing compressors on a large nanopore data set. The results show that DualFqz achieves the best compression performance. The experiments also show that most current implementations of compressors fail to execute correctly on files with long variable length reads. DualCtx and DualFqz are freely available for download at: https://github.com/guidufort/DualFqz.

Original languageEnglish (US)
Title of host publicationBioinformatics and Biomedical Engineering - 7th International Work-Conference, IWBBIO 2019, Proceedings
EditorsFrancisco Ortuño, Ignacio Rojas, Fernando Rojas, Olga Valenzuela, Francisco Ortuño
PublisherSpringer-Verlag Berlin Heidelberg
Pages36-47
Number of pages12
ISBN (Print)9783030179373
DOIs
StatePublished - Jan 1 2019
Event7th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2019 - Granada, Spain
Duration: May 8 2019May 10 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11465 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2019
CountrySpain
CityGranada
Period5/8/195/10/19

Keywords

  • FASTQ compression
  • Genomic data compression
  • Nanopore sequencing technology

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Compression of Nanopore FASTQ Files'. Together they form a unique fingerprint.

  • Cite this

    Dufort y Álvarez, G., Seroussi, G., Smircich, P., Sotelo, J., Ochoa, I., & Martín, Á. (2019). Compression of Nanopore FASTQ Files. In F. Ortuño, I. Rojas, F. Rojas, O. Valenzuela, & F. Ortuño (Eds.), Bioinformatics and Biomedical Engineering - 7th International Work-Conference, IWBBIO 2019, Proceedings (pp. 36-47). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11465 LNBI). Springer-Verlag Berlin Heidelberg. https://doi.org/10.1007/978-3-030-17938-0_4