TY - GEN
T1 - Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization
AU - Zhang, Gavin
AU - Fattahi, Salar
AU - Zhang, Richard Y.
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - In practical instances of nonconvex matrix factorization, the rank of the true solution r* is often unknown, so the rank r of the model can be overspecified as r > r*. This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with r = r* to a sublinear rate when r > r*. We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized case, while also making it agnostic to possible ill-conditioning in the ground truth. Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular. Our key result is that this singularity can be corrected by ℓ2 regularization with a specific range of values for the damping parameter. In fact, a good damping parameter can be inexpensively estimated from the current iterate. The resulting algorithm, which we call preconditioned gradient descent or PrecGD, is stable under noise, and converges linearly to an information theoretically optimal error bound. Our numerical experiments find that PrecGD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime.
AB - In practical instances of nonconvex matrix factorization, the rank of the true solution r* is often unknown, so the rank r of the model can be overspecified as r > r*. This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with r = r* to a sublinear rate when r > r*. We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized case, while also making it agnostic to possible ill-conditioning in the ground truth. Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular. Our key result is that this singularity can be corrected by ℓ2 regularization with a specific range of values for the damping parameter. In fact, a good damping parameter can be inexpensively estimated from the current iterate. The resulting algorithm, which we call preconditioned gradient descent or PrecGD, is stable under noise, and converges linearly to an information theoretically optimal error bound. Our numerical experiments find that PrecGD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime.
UR - http://www.scopus.com/inward/record.url?scp=85131797689&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131797689&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85131797689
T3 - Advances in Neural Information Processing Systems
SP - 5985
EP - 5996
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -