Efficient and scalable workflows for genomic analyses

Subho S. Banerjee, Arjun P. Athreya, Liudmila S. Mainzer, C. Victor Jongeneel, Wen Mei Hwu, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent growth in the volume of DNA sequence data and the associated computational costs of extracting meaningful information warrant the need for efficient computational systems at scale. In this work, we propose the Illinois Genomics Execution Environment (IGen), a framework for efficient and scalable genome analyses. The design philosophy of IGen is based on algorithmic analysis and extensive measurements on compute- and data-intensive genomic analyses workflows (such as variant discovery and genotyping analysis) executed on high-performance and cloud computing infrastructures. IGen leverages the advantages of existing designs and proposes new software improvements to overcome the inefficiencies we observe in our measurements. Based on these composite improvements, we demonstrate that IGen is able to accelerate the alignment from 13.1 hours to 10.8 hours (1.2×) and the variant from 10.1 hours to 1.25 hours (8×) calling on a single node, and its modular design scales efficiently in a parallel computing environment.

Original languageEnglish (US)
Title of host publicationDIDC 2016 - Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages27-36
Number of pages10
ISBN (Electronic)9781450343527
DOIs
StatePublished - Jun 1 2016
Event6th ACM International Workshop on Data-Intensive Distributed Computing, DIDC 2016 - Kyoto, Japan
Duration: Jun 1 2016 → …

Publication series

NameDIDC 2016 - Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing

Other

Other6th ACM International Workshop on Data-Intensive Distributed Computing, DIDC 2016
CountryJapan
CityKyoto
Period6/1/16 → …

Keywords

  • Bioinformatics
  • Design
  • Genomics
  • Measurement
  • Performance

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Efficient and scalable workflows for genomic analyses'. Together they form a unique fingerprint.

  • Cite this

    Banerjee, S. S., Athreya, A. P., Mainzer, L. S., Jongeneel, C. V., Hwu, W. M., Kalbarczyk, Z. T., & Iyer, R. K. (2016). Efficient and scalable workflows for genomic analyses. In DIDC 2016 - Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing (pp. 27-36). (DIDC 2016 - Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing). Association for Computing Machinery, Inc. https://doi.org/10.1145/2912152.2912156