TY - GEN
T1 - Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree
AU - Ma, Linjian
AU - Solomonik, Edgar
N1 - Funding Information:
Linjian Ma and Edgar Solomonik were supported by the US NSF OAC SSI program, award No. 1931258. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. We used XSEDE to employ Stampede2 at the Texas Advanced Computing Center (TACC) through allocation TG-CCR180006.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N-1)/N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N-1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speedup from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.
AB - The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across MTTKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N-1)/N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N-1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speedup from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.
KW - Communication avoiding algorithm
KW - CP decomposition
KW - Dimension tree
KW - Pairwise perturbation
KW - Tensor decomposition
UR - http://www.scopus.com/inward/record.url?scp=85113452211&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113452211&partnerID=8YFLogxK
U2 - 10.1109/IPDPS49936.2021.00049
DO - 10.1109/IPDPS49936.2021.00049
M3 - Conference contribution
AN - SCOPUS:85113452211
T3 - Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
SP - 412
EP - 421
BT - Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021
Y2 - 17 May 2021 through 21 May 2021
ER -