TY - GEN
T1 - Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network
AU - Kanakagiri, Raghavendra
AU - Solomonik, Edgar
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/6/17
Y1 - 2024/6/17
N2 - Sparse tensor decomposition and completion are common in numerous applications, ranging from machine learning to computational quantum chemistry. Typically, the main bottleneck in optimization of these models are contractions of a single large sparse tensor with a network of several dense matrices or tensors (SpTTN). Prior works on high-performance tensor decomposition and completion have focused on performance and scalability optimizations for specific SpTTN kernels. We present algorithms and a runtime system for identifying and executing the most efficient loop nest for any SpTTN kernel. We consider both enumeration of such loop nests for autotuning and efficient algorithms for finding the lowest cost loop nest for simpler metrics, such as buffer size or cache miss models. Our runtime system identifies the best choice of loop nest without user guidance, and also provides a distributed-memory parallelization of SpTTN kernels. We evaluate our framework using both real-world and synthetic tensors. Our results demonstrate that our approach outperforms available generalized state-of-the-art libraries and matches the performance of specialized codes.
AB - Sparse tensor decomposition and completion are common in numerous applications, ranging from machine learning to computational quantum chemistry. Typically, the main bottleneck in optimization of these models are contractions of a single large sparse tensor with a network of several dense matrices or tensors (SpTTN). Prior works on high-performance tensor decomposition and completion have focused on performance and scalability optimizations for specific SpTTN kernels. We present algorithms and a runtime system for identifying and executing the most efficient loop nest for any SpTTN kernel. We consider both enumeration of such loop nests for autotuning and efficient algorithms for finding the lowest cost loop nest for simpler metrics, such as buffer size or cache miss models. Our runtime system identifies the best choice of loop nest without user guidance, and also provides a distributed-memory parallelization of SpTTN kernels. We evaluate our framework using both real-world and synthetic tensors. Our results demonstrate that our approach outperforms available generalized state-of-the-art libraries and matches the performance of specialized codes.
KW - sparse tensor algebra
KW - tensor contraction
KW - tensor decomposition and completion
UR - http://www.scopus.com/inward/record.url?scp=85197456339&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197456339&partnerID=8YFLogxK
U2 - 10.1145/3626183.3659985
DO - 10.1145/3626183.3659985
M3 - Conference contribution
AN - SCOPUS:85197456339
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 169
EP - 181
BT - SPAA 2024 - Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures
PB - Association for Computing Machinery
T2 - 36th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2024
Y2 - 17 June 2024 through 21 June 2024
ER -