Denoising of Aligned Genomic Data

Irena Fischer-Hwang, Idoia Ochoa, Tsachy Weissman, Mikel Hernaez

Research output: Contribution to journalArticle

Abstract

Noise in genomic sequencing data is known to have effects on various stages of genomic data analysis pipelines. Variant identification is an important step of many of these pipelines, and is increasingly being used in clinical settings to aid medical practices. We propose a denoising method, dubbed SAMDUDE, which operates on aligned genomic data in order to improve variant calling performance. Denoising human data with SAMDUDE resulted in improved variant identification in both individual chromosome as well as whole genome sequencing (WGS) data sets. In the WGS data set, denoising led to identification of almost 2,000 additional true variants, and elimination of over 1,500 erroneously identified variants. In contrast, we found that denoising with other state-of-the-art denoisers significantly worsens variant calling performance. SAMDUDE is written in Python and is freely available at https://github.com/ihwang/SAMDUDE.

Original languageEnglish (US)
Article number15067
JournalScientific reports
Volume9
Issue number1
DOIs
StatePublished - Dec 1 2019

ASJC Scopus subject areas

  • General

Fingerprint Dive into the research topics of 'Denoising of Aligned Genomic Data'. Together they form a unique fingerprint.

  • Cite this

    Fischer-Hwang, I., Ochoa, I., Weissman, T., & Hernaez, M. (2019). Denoising of Aligned Genomic Data. Scientific reports, 9(1), [15067]. https://doi.org/10.1038/s41598-019-51418-z