Do read errors matter for genome assembly?

Ilan Shomorony, Thomas Courtade, David Tse

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Symposium on Information Theory, ISIT 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages919-923
Number of pages5
ISBN (Electronic)9781467377041
DOIs
StatePublished - Sep 28 2015
Externally publishedYes
EventIEEE International Symposium on Information Theory, ISIT 2015 - Hong Kong, Hong Kong
Duration: Jun 14 2015Jun 19 2015

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2015-June
ISSN (Print)2157-8095

Other

OtherIEEE International Symposium on Information Theory, ISIT 2015
Country/TerritoryHong Kong
CityHong Kong
Period6/14/156/19/15

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Do read errors matter for genome assembly?'. Together they form a unique fingerprint.

Cite this