Efficient mini-batch training for stochastic optimization

Mu Li, Tong Zhang, Yuqiang Chen, Alexander J. Smola

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Stochastic gradient descent (SGD) is a popular technique for large-scale optimization problems in machine learning. In order to parallelize SGD, minibatch training needs to be employed to reduce the communication cost. However, an increase in minibatch size typically decreases the rate of convergence. This paper introduces a technique based on approximate optimization of a conservatively regularized objective function within each minibatch. We prove that the convergence rate does not decrease with increasing minibatch size. Experiments demonstrate that with suitable implementations of approximate optimization, the resulting algorithm can outperform standard SGD in many scenarios.

Original languageEnglish (US)
Title of host publicationKDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages661-670
Number of pages10
ISBN (Print)9781450329569
DOIs
StatePublished - 2014
Externally publishedYes
Event20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States
Duration: Aug 24 2014Aug 27 2014

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
Country/TerritoryUnited States
CityNew York, NY
Period8/24/148/27/14

Keywords

  • big data
  • distributed computing
  • machine learning
  • minibatch
  • stochastic gradient descent

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Efficient mini-batch training for stochastic optimization'. Together they form a unique fingerprint.

Cite this