On the convergence of a class of Adam-type algorithms for non-convex optimization

Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong

Research output: Contribution to conferencePaper

Abstract

This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the “Adam-type”, includes the popular algorithms such as Adam (Kingma & Ba, 2014), AMSGrad (Reddi et al., 2018), AdaGrad (Duchi et al., 2011). Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving non-convex problems remains an open question. In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order O(log T/T) for non-convex stochastic optimization. Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum). We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge. Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily monitor the progress of algorithms and determine their convergence behavior.

Original languageEnglish (US)
StatePublished - Jan 1 2019
Event7th International Conference on Learning Representations, ICLR 2019 - New Orleans, United States
Duration: May 6 2019May 9 2019

Conference

Conference7th International Conference on Learning Representations, ICLR 2019
CountryUnited States
CityNew Orleans
Period5/6/195/9/19

Fingerprint

Momentum
neural network
popularity
guarantee
learning
Deep neural networks
Monitor
Neural Networks

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this

Chen, X., Liu, S., Sun, R., & Hong, M. (2019). On the convergence of a class of Adam-type algorithms for non-convex optimization. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.

On the convergence of a class of Adam-type algorithms for non-convex optimization. / Chen, Xiangyi; Liu, Sijia; Sun, Ruoyu; Hong, Mingyi.

2019. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.

Research output: Contribution to conferencePaper

Chen, X, Liu, S, Sun, R & Hong, M 2019, 'On the convergence of a class of Adam-type algorithms for non-convex optimization' Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States, 5/6/19 - 5/9/19, .
Chen X, Liu S, Sun R, Hong M. On the convergence of a class of Adam-type algorithms for non-convex optimization. 2019. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.
Chen, Xiangyi ; Liu, Sijia ; Sun, Ruoyu ; Hong, Mingyi. / On the convergence of a class of Adam-type algorithms for non-convex optimization. Paper presented at 7th International Conference on Learning Representations, ICLR 2019, New Orleans, United States.
@conference{779e3e8e193542ca83d73e59ad818fc3,
title = "On the convergence of a class of Adam-type algorithms for non-convex optimization",
abstract = "This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the “Adam-type”, includes the popular algorithms such as Adam (Kingma & Ba, 2014), AMSGrad (Reddi et al., 2018), AdaGrad (Duchi et al., 2011). Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving non-convex problems remains an open question. In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order O(log T/T) for non-convex stochastic optimization. Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum). We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge. Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily monitor the progress of algorithms and determine their convergence behavior.",
author = "Xiangyi Chen and Sijia Liu and Ruoyu Sun and Mingyi Hong",
year = "2019",
month = "1",
day = "1",
language = "English (US)",
note = "7th International Conference on Learning Representations, ICLR 2019 ; Conference date: 06-05-2019 Through 09-05-2019",

}

TY - CONF

T1 - On the convergence of a class of Adam-type algorithms for non-convex optimization

AU - Chen, Xiangyi

AU - Liu, Sijia

AU - Sun, Ruoyu

AU - Hong, Mingyi

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the “Adam-type”, includes the popular algorithms such as Adam (Kingma & Ba, 2014), AMSGrad (Reddi et al., 2018), AdaGrad (Duchi et al., 2011). Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving non-convex problems remains an open question. In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order O(log T/T) for non-convex stochastic optimization. Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum). We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge. Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily monitor the progress of algorithms and determine their convergence behavior.

AB - This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the “Adam-type”, includes the popular algorithms such as Adam (Kingma & Ba, 2014), AMSGrad (Reddi et al., 2018), AdaGrad (Duchi et al., 2011). Despite their popularity in training deep neural networks (DNNs), the convergence of these algorithms for solving non-convex problems remains an open question. In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order O(log T/T) for non-convex stochastic optimization. Our convergence analysis applies to a new algorithm called AdaFom (AdaGrad with First Order Momentum). We show that the conditions are essential, by identifying concrete examples in which violating the conditions makes an algorithm diverge. Besides providing one of the first comprehensive analysis for Adam-type methods in the non-convex setting, our results can also help the practitioners to easily monitor the progress of algorithms and determine their convergence behavior.

UR - http://www.scopus.com/inward/record.url?scp=85071198379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071198379&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85071198379

ER -