GABAC: An arithmetic coding solution for genomic data

Jan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S. Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez, John Hancock

Research output: Contribution to journalArticle

Abstract

Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.

Original languageEnglish (US)
Pages (from-to)2275-2277
Number of pages3
JournalBioinformatics
Volume36
Issue number7
DOIs
StatePublished - Apr 1 2020

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'GABAC: An arithmetic coding solution for genomic data'. Together they form a unique fingerprint.

  • Cite this

    Voges, J., Paridaens, T., Müntefering, F., Mainzer, L. S., Bliss, B., Yang, M., Ochoa, I., Fostier, J., Ostermann, J., Hernaez, M., & Hancock, J. (2020). GABAC: An arithmetic coding solution for genomic data. Bioinformatics, 36(7), 2275-2277. https://doi.org/10.1093/bioinformatics/btz922