Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly

Jeffrey J. Cook, Craig Zilles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work, we analyze the memory-intensive bioinformatics problem of "de novo" DNA sequence assembly, which is the process of assembling short DNA sequences obtained by experiment into larger contiguous sequences. In particular, we analyze the performance scaling challenges inherent to de Bruijn graph-based assembly, which is particularly well suited for the data produced by "next generation" sequencing machines. Unlike many bioinformatics codes which are computationintensive or control-intensive, we find the memory footprint to be the primary performance issue for de novo sequence assembly. Specifically, we make four main contributions: 1) we demonstrate analytically that performing error correction before sequence assembly enables larger genomes to be assembled in a given amount of memory, 2) we identify that the use of this technique provides the key performance advantage to the leading assembly code, Velvet, 3) we demonstrate how this pre-assembly error correction technique can be subdivided into multiple passes to enable de Bruijn graph-Based assembly to scale to even larger genomes, and 4) we demonstrate how Velvet's in-core performance can be improved using memorycentric optimizations.

Original languageEnglish (US)
Title of host publicationISPASS 2009 - International Symposium on Performance Analysis of Systems and Software
Pages143-152
Number of pages10
DOIs
StatePublished - Sep 22 2009
EventInternational Symposium on Performance Analysis of Systems and Software, ISPASS 2009 - Boston, MA, United States
Duration: Apr 26 2009Apr 28 2009

Publication series

NameISPASS 2009 - International Symposium on Performance Analysis of Systems and Software

Other

OtherInternational Symposium on Performance Analysis of Systems and Software, ISPASS 2009
CountryUnited States
CityBoston, MA
Period4/26/094/28/09

ASJC Scopus subject areas

  • Information Systems
  • Software

Fingerprint Dive into the research topics of 'Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly'. Together they form a unique fingerprint.

  • Cite this

    Cook, J. J., & Zilles, C. (2009). Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly. In ISPASS 2009 - International Symposium on Performance Analysis of Systems and Software (pp. 143-152). [4919646] (ISPASS 2009 - International Symposium on Performance Analysis of Systems and Software). https://doi.org/10.1109/ISPASS.2009.4919646