TY - JOUR
T1 - Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability
AU - Tzen, Belinda
AU - Liang, Tengyuan
AU - Raginsky, Maxim
N1 - The authors would like to thank Matus Telgarsky for many enlightening discussions. The work of Belinda Tzen and Maxim Raginsky is supported in part by the NSF under CAREER award CCF–1254041, and in part by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF–0939370. Tengyuan Liang is supported by George C. Tiao Faculty Fellowship in Data Science Research.
PY - 2018
Y1 - 2018
N2 - We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003). For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the ε-neighborhood of this particular optimum within a short recurrence time; (2) it enters this ε-neighborhood by the recurrence time and stays there until a potentially exponentially long escape time. We call this phenomenon empirical metastability. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the effective recurrence time (i.e., number of iterations multiplied by stepsize) is dimension-independent, and resembles the convergence time of continuous-time deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.
AB - We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003). For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the ε-neighborhood of this particular optimum within a short recurrence time; (2) it enters this ε-neighborhood by the recurrence time and stays there until a potentially exponentially long escape time. We call this phenomenon empirical metastability. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the effective recurrence time (i.e., number of iterations multiplied by stepsize) is dimension-independent, and resembles the convergence time of continuous-time deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.
UR - https://www.scopus.com/pages/publications/85162170152
UR - https://www.scopus.com/pages/publications/85162170152#tab=citedBy
M3 - Conference article
AN - SCOPUS:85162170152
SN - 2640-3498
VL - 75
SP - 857
EP - 875
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 31st Annual Conference on Learning Theory, COLT 2018
Y2 - 6 July 2018 through 9 July 2018
ER -