Denoising of Quality Scores for Boosted Inference and Reduced Storage

Idoia Ochoa, Mikel Hernaez, Rachel Goldfeder, Tsachy Weissman, Euan Ashley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores.

Original languageEnglish (US)
Title of host publicationProceedings - DCC 2016
Subtitle of host publication2016 Data Compression Conference
EditorsMichael W. Marcellin, Ali Bilgin, Joan Serra-Sagrista, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages251-260
Number of pages10
ISBN (Electronic)9781509018536
DOIs
StatePublished - Dec 15 2016
Event2016 Data Compression Conference, DCC 2016 - Snowbird, United States
Duration: Mar 29 2016Apr 1 2016

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314

Other

Other2016 Data Compression Conference, DCC 2016
Country/TerritoryUnited States
CitySnowbird
Period3/29/164/1/16

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Denoising of Quality Scores for Boosted Inference and Reduced Storage'. Together they form a unique fingerprint.

Cite this