Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

Cong Fang, Zhouchen Lin, Tong Zhang

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we give a sharp analysis1 for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an (ε, O(ε0.5))-approximate second-order stationary point in Õ(ε-3.5) stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions. This result subverts the classical belief that SGD requires at least O(ε-4) stochastic gradient computations for obtaining an (ε, O(ε0.5))-approximate second-order stationary point. Such SGD rate matches, up to a polylogarithmic factor of problem-dependent parameters, the rate of most accelerated nonconvex stochastic optimization algorithms that adopt additional techniques, such as Nesterov’s momentum acceleration, negative curvature search, as well as quadratic and cubic regularization tricks. Our novel analysis gives new insights into nonconvex SGD and can be potentially generalized to a broad class of stochastic optimization algorithms.

Original languageEnglish (US)
Pages (from-to)1192-1234
Number of pages43
JournalProceedings of Machine Learning Research
Volume99
StatePublished - 2019
Externally publishedYes
Event32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States
Duration: Jun 25 2019Jun 28 2019

Keywords

  • Convergence Rate
  • Non-convex Optimization
  • Saddle Escaping
  • Stochastic Gradient Descent

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Sharp Analysis for Nonconvex SGD Escaping from Saddle Points'. Together they form a unique fingerprint.

Cite this