TY - GEN
T1 - Optimizing Sparse Matrix Operations on GPUs Using Merge Path
AU - Dalton, Steven
AU - Baxter, Sean
AU - Merrill, Duane
AU - Olson, Luke
AU - Garland, Michael
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/7/17
Y1 - 2015/7/17
N2 - Irregular computations on large workloads are a necessity in many areas of computational science. Mapping these computations to modern parallel architectures, such as GPUs, is particularly challenging because the performance often depends critically on the choice of data-structure and algorithm. In this paper, we develop a parallel processing scheme, based on Merge Path partitioning, to compute segmented row-wise operations on sparse matrices that exposes parallelism at the granularity of individual nonzero entries. Our decomposition achieves competitive performance across many diverse problems while maintaining predictable behaviour dependent only on the computational work and ameliorates the impact of irregularity. We evaluate the performance of three sparse kernels: Spiv, Spaded and Sperm. We show that our processing scheme for each kernel yields comparable performance to other schemes in many cases and our performance is highly correlated, nearly 1, to the computational work irrespective of the underlying structure of the matrices.
AB - Irregular computations on large workloads are a necessity in many areas of computational science. Mapping these computations to modern parallel architectures, such as GPUs, is particularly challenging because the performance often depends critically on the choice of data-structure and algorithm. In this paper, we develop a parallel processing scheme, based on Merge Path partitioning, to compute segmented row-wise operations on sparse matrices that exposes parallelism at the granularity of individual nonzero entries. Our decomposition achieves competitive performance across many diverse problems while maintaining predictable behaviour dependent only on the computational work and ameliorates the impact of irregularity. We evaluate the performance of three sparse kernels: Spiv, Spaded and Sperm. We show that our processing scheme for each kernel yields comparable performance to other schemes in many cases and our performance is highly correlated, nearly 1, to the computational work irrespective of the underlying structure of the matrices.
KW - gpu
KW - parallel
KW - sorting
KW - sparse matrix-matrix
UR - http://www.scopus.com/inward/record.url?scp=84971377598&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84971377598&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2015.98
DO - 10.1109/IPDPS.2015.98
M3 - Conference contribution
AN - SCOPUS:84971377598
T3 - Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015
SP - 407
EP - 416
BT - Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015
Y2 - 25 May 2015 through 29 May 2015
ER -