ChIPWig: A random access-enabling lossless and lossy compression method for ChIP-seq data

Vida Ravanmehr, Minji Kim, Zhiying Wang, Olgica Milenkovic

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers. Results We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. Availability and implementation The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++. Contact [email protected] Supplementary informationSupplementary dataare available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)911-919
Number of pages9
JournalBioinformatics
Volume34
Issue number6
DOIs
StatePublished - Mar 15 2018

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'ChIPWig: A random access-enabling lossless and lossy compression method for ChIP-seq data'. Together they form a unique fingerprint.

Cite this