TY - JOUR
T1 - Distributed-memory tensor completion for generalized loss functions in python using new sparse tensor kernels
AU - Singh, Navjot
AU - Zhang, Zecheng
AU - Wu, Xiaoxiao
AU - Zhang, Naijing
AU - Zhang, Siyuan
AU - Solomonik, Edgar
N1 - Funding Information:
This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 . Via XSEDE, the authors made use of the TACC Stampede2 supercomputer. The research was supported by the US NSF OAC via award No. 1942995 .
Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/11
Y1 - 2022/11
N2 - Tensor computations are increasingly prevalent numerical techniques in data science, but pose unique challenges for high-performance implementation. We provide novel algorithms and systems infrastructure which enable efficient parallel implementation of algorithms for tensor completion with generalized loss functions. Specifically, we consider alternating minimization, coordinate minimization, and a quasi-Newton (generalized Gauss-Newton) method. By extending the Cyclops library, we implement all of these methods in high-level Python syntax. To make possible tensor completion for very sparse tensors, we introduce new multi-tensor primitives, for which we provide specialized parallel implementations. We compare these routines to pairwise contraction of sparse tensors by reduction to hypersparse matrix formats, and find that the multi-tensor routines are more efficient in theoretical cost and execution time in experiments. We provide microbenchmarking results on the Stampede2 supercomputer to demonstrate the efficiency of the new primitives and Cyclops functionality. We then study the performance of the tensor completion methods for a synthetic tensor with 10 billion nonzeros and the Netflix dataset, considering both least squares and Poisson loss functions.
AB - Tensor computations are increasingly prevalent numerical techniques in data science, but pose unique challenges for high-performance implementation. We provide novel algorithms and systems infrastructure which enable efficient parallel implementation of algorithms for tensor completion with generalized loss functions. Specifically, we consider alternating minimization, coordinate minimization, and a quasi-Newton (generalized Gauss-Newton) method. By extending the Cyclops library, we implement all of these methods in high-level Python syntax. To make possible tensor completion for very sparse tensors, we introduce new multi-tensor primitives, for which we provide specialized parallel implementations. We compare these routines to pairwise contraction of sparse tensors by reduction to hypersparse matrix formats, and find that the multi-tensor routines are more efficient in theoretical cost and execution time in experiments. We provide microbenchmarking results on the Stampede2 supercomputer to demonstrate the efficiency of the new primitives and Cyclops functionality. We then study the performance of the tensor completion methods for a synthetic tensor with 10 billion nonzeros and the Netflix dataset, considering both least squares and Poisson loss functions.
KW - CP decomposition
KW - Cyclops tensor framework
KW - Tensor completion
UR - http://www.scopus.com/inward/record.url?scp=85135109176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135109176&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2022.07.005
DO - 10.1016/j.jpdc.2022.07.005
M3 - Article
AN - SCOPUS:85135109176
SN - 0743-7315
VL - 169
SP - 269
EP - 285
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -