TY - GEN
T1 - Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
AU - Ranawaka, Isuru
AU - Hussain, Md Taufique
AU - Block, Charles
AU - Gerogiannis, Gerasimos
AU - Torrellas, Josep
AU - Azad, Ariful
N1 - This research was funded in part by DOE grants DESC0022098 and DE-SC0023349; by NSF grants PPoSS CCF 2316233 and OAC-2339607; by SRC JUMP 2.0 ACE Center; and by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 21-46756.
PY - 2024
Y1 - 2024
N2 - We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TSSpGEMM algorithm attains 5× performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
AB - We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TSSpGEMM algorithm attains 5× performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
UR - http://www.scopus.com/inward/record.url?scp=85214986600&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85214986600&partnerID=8YFLogxK
U2 - 10.1109/SC41406.2024.00052
DO - 10.1109/SC41406.2024.00052
M3 - Conference contribution
AN - SCOPUS:85214986600
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2024
PB - IEEE Computer Society
T2 - 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
Y2 - 17 November 2024 through 22 November 2024
ER -