TY - JOUR
T1 - ASAP
T2 - Accelerated Short-Read Alignment on Programmable Hardware
AU - Banerjee, Subho Sankar
AU - El-Hadedy, Mohamed
AU - Lim, Jong Bin
AU - Kalbarczyk, Zbigniew T.
AU - Chen, Deming
AU - Lumetta, Steven S.
AU - Iyer, Ravishankar K.
N1 - This research was supported by several grants: in part by the US National Science Foundation (NSF) under Grant Nos. CNS 13-37732 and CNS 16-24790; in part by the Blue Waters sustained-petascale computing project supported by the US National Science Foundation (awards OCI-0725070 and ACI- 1238993) and the state of Illinois; and in part by IBM Faculty Awards. We thank Zachary Stephens, Jenny Applequist and Kathleen Atchley for their help in preparing the manuscript.
This research was supported by several grants: in part by the US National Science Foundation (NSF) under Grant Nos. CNS 13-37732 and CNS 16-24790; in part by the Blue Waters sustained-petascale computing project supported by the US National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois; and in part by IBM Faculty Awards. We thank Zachary Stephens, Jenny Applequist and Kathleen Atchley for their help in preparing the manuscript.
PY - 2019/3/1
Y1 - 2019/3/1
N2 - The proliferation of high-throughput sequencing machines ensures rapid generation of up to billions of short nucleotide fragments in a short period of time. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This paper explores the use of hardware acceleration to significantly improve the runtime of short-read alignment, a crucial step in preprocessing sequenced genomes. We focus on the Levenshtein distance (edit-distance) computation kernel and propose the ASAP accelerator, which utilizes the intrinsic delay of circuits for edit-distance computation elements as a proxy for computation. Our design is implemented on an Xilinx Virtex 7 FPGA in an IBM POWER8 system that uses the CAPI interface for cache coherence across the CPU and FPGA. Our design is 200 × faster than an equivalent Smith-Waterman-C implementation of the kernel running on the host processor, 40-60 × faster than an equivalent Landau-Vishkin-C++ implementation of the kernel running on the IBM Power8 host processor, and 2 × faster for an end-to-end alignment tool for 120-150 base-pair short-read sequences. Further the design represents a 3760 × improvement over the CPU in performance/Watt terms.
AB - The proliferation of high-throughput sequencing machines ensures rapid generation of up to billions of short nucleotide fragments in a short period of time. This massive amount of sequence data can quickly overwhelm today's storage and compute infrastructure. This paper explores the use of hardware acceleration to significantly improve the runtime of short-read alignment, a crucial step in preprocessing sequenced genomes. We focus on the Levenshtein distance (edit-distance) computation kernel and propose the ASAP accelerator, which utilizes the intrinsic delay of circuits for edit-distance computation elements as a proxy for computation. Our design is implemented on an Xilinx Virtex 7 FPGA in an IBM POWER8 system that uses the CAPI interface for cache coherence across the CPU and FPGA. Our design is 200 × faster than an equivalent Smith-Waterman-C implementation of the kernel running on the host processor, 40-60 × faster than an equivalent Landau-Vishkin-C++ implementation of the kernel running on the IBM Power8 host processor, and 2 × faster for an end-to-end alignment tool for 120-150 base-pair short-read sequences. Further the design represents a 3760 × improvement over the CPU in performance/Watt terms.
KW - Bioinformatics
KW - application-specific processor
KW - genomics
KW - hardware accelerator
KW - levenshtein distance
UR - http://www.scopus.com/inward/record.url?scp=85055031533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055031533&partnerID=8YFLogxK
U2 - 10.1109/TC.2018.2875733
DO - 10.1109/TC.2018.2875733
M3 - Article
AN - SCOPUS:85055031533
SN - 0018-9340
VL - 68
SP - 331
EP - 346
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 3
M1 - 8490591
ER -