TY - JOUR
T1 - Comparison of accuracy and scalability of gauss-Newton and alternating least squares for candecomc/parafac decomposition
AU - Singh, Navjot
AU - Ma, Linjian
AU - Yang, Hongru
AU - Solomonik, Edgar
N1 - Funding Information:
\ast Submitted to the journal's Software and High-Performance Computing section June 11, 2020; accepted for publication (in revised form) March 22, 2021; published electronically August 4, 2021. https://doi.org/10.1137/20M1344561 \bfF \bfu \bfn \bfd \bfi \bfn \bfg : The work of the first, second, and fourth authors was supported by the US NSF OAC SSI program, award 1931258. This work was supported by National Science Foundation grants ACI-1548562, OCI-0725070, and ACI-1238993, and by the Texas Advanced Computing Center (TACC) through allocation TG-CCR180006. \dagger Department of Mathematics, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA ([email protected]). \ddagger Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL 61801 USA ([email protected], [email protected], [email protected]).
Publisher Copyright:
© 2021 Society for Industrial and Applied Mathematics.
PY - 2021
Y1 - 2021
N2 - Alternating least squares is the most widely used algorithm for CANDECOMC/ PARAFAC (CP) tensor decomposition. However, alternating least squares may exhibit slow or no convergence, especially when high accuracy is required. An alternative approach is to regard CP decomposition as a nonlinear least squares problem and employ Newton-like methods. Direct solution of linear systems involving an approximated Hessian is generally expensive. However, recent advancements have shown that use of an implicit representation of the linear system makes these methods competitive with alternating least squares (ALS). We provide the first parallel implementation of a Gauss-Newton method for CP decomposition, which iteratively solves linear least squares problems at each Gauss-Newton step. In particular, we leverage a formulation that employs tensor contractions for implicit matrix-vector products within the conjugate gradient method. The use of tensor contractions enables us to employ the Cyclops library for distributed-memory tensor computations to parallelize the Gauss-Newton approach with a high-level Python implementation. In addition, we propose a regularization scheme for the Gauss-Newton method to improve convergence properties without any additional cost. We study the convergence of variants of the Gauss-Newton method relative to ALS for finding exact CP decompositions as well as approximate decompositions of real-world tensors. We evaluate the performance of sequential and parallel versions of both approaches, and study the parallel scalability on the Stampede2 supercomputer.
AB - Alternating least squares is the most widely used algorithm for CANDECOMC/ PARAFAC (CP) tensor decomposition. However, alternating least squares may exhibit slow or no convergence, especially when high accuracy is required. An alternative approach is to regard CP decomposition as a nonlinear least squares problem and employ Newton-like methods. Direct solution of linear systems involving an approximated Hessian is generally expensive. However, recent advancements have shown that use of an implicit representation of the linear system makes these methods competitive with alternating least squares (ALS). We provide the first parallel implementation of a Gauss-Newton method for CP decomposition, which iteratively solves linear least squares problems at each Gauss-Newton step. In particular, we leverage a formulation that employs tensor contractions for implicit matrix-vector products within the conjugate gradient method. The use of tensor contractions enables us to employ the Cyclops library for distributed-memory tensor computations to parallelize the Gauss-Newton approach with a high-level Python implementation. In addition, we propose a regularization scheme for the Gauss-Newton method to improve convergence properties without any additional cost. We study the convergence of variants of the Gauss-Newton method relative to ALS for finding exact CP decompositions as well as approximate decompositions of real-world tensors. We evaluate the performance of sequential and parallel versions of both approaches, and study the parallel scalability on the Stampede2 supercomputer.
KW - Alternating least squares
KW - CP decomposition
KW - Cyclops tensor framework
KW - Gauss-Newton method
KW - Tensor decomposition
UR - http://www.scopus.com/inward/record.url?scp=85112402008&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112402008&partnerID=8YFLogxK
U2 - 10.1137/20M1344561
DO - 10.1137/20M1344561
M3 - Article
AN - SCOPUS:85112402008
SN - 1064-8275
VL - 43
SP - C290-C311
JO - SIAM Journal on Scientific Computing
JF - SIAM Journal on Scientific Computing
IS - 4
ER -