Accelerating stochastic gradient descent using predictive variance reduction

Rie Johnson, Tong Zhang

Research output: Contribution to journalConference articlepeer-review

Abstract

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG). However, our analysis is significantly simpler and more intuitive. Moreover, unlike SDCA or SAG, our method does not require the storage of gradients, and thus is more easily applicable to complex problems such as some structured prediction problems and neural network learning.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
StatePublished - 2013
Externally publishedYes
Event27th Annual Conference on Neural Information Processing Systems, NIPS 2013 - Lake Tahoe, NV, United States
Duration: Dec 5 2013Dec 10 2013

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Accelerating stochastic gradient descent using predictive variance reduction'. Together they form a unique fingerprint.

Cite this