TY - GEN
T1 - An algorithm for fast edit distance computation on GPUs
AU - Farivar, Reza
AU - Kharbanda, Harshit
AU - Venkataraman, Shivaram
AU - Campbell, R H
PY - 2012/12/12
Y1 - 2012/12/12
N2 - The problem of finding the edit distance between two sequences (and its closely related problem of longest common subsequence) are important problems with applications in many domains like virus scanners, security kernels, natural language translation and genome sequence alignment. The traditional dynamic-programming based algorithm is hard to parallelize on SIMD processors as the algorithm is memory intensive and has many divergent control paths. In this paper we introduce a new algorithm which modifies the dynamic programming method to reduce its amount of data storage and eliminate control flow divergences. Our algorithm divides the problem into independent 'quadrants' and makes efficient use of shared memory and registers available in GPUs to store data between different phases of the algorithm. Further, we eliminate any control flow divergences by embedding condition variables in the program logic to ensure all the threads execute the same instructions even though they work on different data items. We present an implementation of this algorithm on an NVIDIA GeForce GTX 275 GPU and compare against an optimized multi-threaded implementation on an Intel Core i7-920 quad core CPU with hyper-threading support. Our results show that our GPU implementation is up to 8x faster when operating on a large number of sequences.
AB - The problem of finding the edit distance between two sequences (and its closely related problem of longest common subsequence) are important problems with applications in many domains like virus scanners, security kernels, natural language translation and genome sequence alignment. The traditional dynamic-programming based algorithm is hard to parallelize on SIMD processors as the algorithm is memory intensive and has many divergent control paths. In this paper we introduce a new algorithm which modifies the dynamic programming method to reduce its amount of data storage and eliminate control flow divergences. Our algorithm divides the problem into independent 'quadrants' and makes efficient use of shared memory and registers available in GPUs to store data between different phases of the algorithm. Further, we eliminate any control flow divergences by embedding condition variables in the program logic to ensure all the threads execute the same instructions even though they work on different data items. We present an implementation of this algorithm on an NVIDIA GeForce GTX 275 GPU and compare against an optimized multi-threaded implementation on an Intel Core i7-920 quad core CPU with hyper-threading support. Our results show that our GPU implementation is up to 8x faster when operating on a large number of sequences.
UR - http://www.scopus.com/inward/record.url?scp=84870723547&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870723547&partnerID=8YFLogxK
U2 - 10.1109/InPar.2012.6339593
DO - 10.1109/InPar.2012.6339593
M3 - Conference contribution
AN - SCOPUS:84870723547
SN - 9781467326322
T3 - 2012 Innovative Parallel Computing, InPar 2012
BT - 2012 Innovative Parallel Computing, InPar 2012
T2 - 2012 Innovative Parallel Computing, InPar 2012
Y2 - 13 May 2012 through 14 May 2012
ER -