TY - GEN
T1 - Improved GPU Implementations of the Pair-HMM Forward Algorithm for DNA Sequence Alignment
AU - Li, Enliang
AU - Banerjee, Subho S.
AU - Huang, Sitao
AU - Iyer, Ravishankar K.
AU - Chen, Deming
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - With the rise of Next-Generation Sequencing (NGS) technology, clinical sequencing services become more accessible but are also facing new challenges. The surging demand motivates developments of more efficient algorithms for computational genomics and their hardware acceleration. In this work, we use GPU to accelerate the DNA variant calling and its related alignment problem. The Pair-Hidden Markov Model (Pair-HMM) is one of the most popular and compute-intensive models used in variant calling. As a critical part of the Pair-HMM, the forward algorithm is not only a computational but data-intensive algorithm. Multiple previous works have been done in efforts to accelerate the computation of the forward algorithm by the massive parallelization of the workload. In this paper, we bring advanced GPU implementations with various optimizations, such as efficient host-device communication, task parallelization, pipelining, and memory management, to tackle this challenging task. Our design has shown a speedup of 783X comparing to the Java baseline on Intel single-core CPU, 31.88X to the C++ baseline on IBM Power8 multicore CPU, and 1.53X - 2.21X to the previous state-of-the-art GPU implementations over various genomics datasets.
AB - With the rise of Next-Generation Sequencing (NGS) technology, clinical sequencing services become more accessible but are also facing new challenges. The surging demand motivates developments of more efficient algorithms for computational genomics and their hardware acceleration. In this work, we use GPU to accelerate the DNA variant calling and its related alignment problem. The Pair-Hidden Markov Model (Pair-HMM) is one of the most popular and compute-intensive models used in variant calling. As a critical part of the Pair-HMM, the forward algorithm is not only a computational but data-intensive algorithm. Multiple previous works have been done in efforts to accelerate the computation of the forward algorithm by the massive parallelization of the workload. In this paper, we bring advanced GPU implementations with various optimizations, such as efficient host-device communication, task parallelization, pipelining, and memory management, to tackle this challenging task. Our design has shown a speedup of 783X comparing to the Java baseline on Intel single-core CPU, 31.88X to the C++ baseline on IBM Power8 multicore CPU, and 1.53X - 2.21X to the previous state-of-the-art GPU implementations over various genomics datasets.
KW - CUDA implementation
KW - Computational Genomics
KW - Forward algorithm
KW - GPU acceleration
KW - Pair-HMM
UR - http://www.scopus.com/inward/record.url?scp=85123913164&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123913164&partnerID=8YFLogxK
U2 - 10.1109/ICCD53106.2021.00055
DO - 10.1109/ICCD53106.2021.00055
M3 - Conference contribution
AN - SCOPUS:85123913164
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 299
EP - 306
BT - Proceedings - 2021 IEEE 39th International Conference on Computer Design, ICCD 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th IEEE International Conference on Computer Design, ICCD 2021
Y2 - 24 October 2021 through 27 October 2021
ER -