High-dimensional variance-reduced stochastic gradient expectation-maximization algorithm

Rongda Zhu, Lingxiao Wang, Chengxiang Zhai, Quanquan Gu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a generic stochastic expectation-maximization (EM) algorithm for the estimation of high-dimensional latent variable models. At the core of our algorithm is a novel semi-stochastic variance-reduced gradient designed for the Q-function in the EM algorithm. Under a mild condition on the initialization, our algorithm is guaranteed to attain a linear convergence rate to the unknown parameter of the latent variable model, and achieve an optimal statistical rate up to a logarithmic factor for parameter estimation. Compared with existing high-dimensional EM algorithms, our algorithm enjoys a better computational complexity and is therefore more efficient. We apply our generic algorithm to two illustrative latent variable models: Gaussian mixture model and mixture of linear regression, and demonstrate the advantages of our algorithm by both theoretical analysis and numerical experiments. We believe that the proposed semi-stochastic gradient is of independent interest for general nonconvex optimization problems with bivariate structures.

Original languageEnglish (US)
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages6337-6345
Number of pages9
ISBN (Electronic)9781510855144
StatePublished - Jan 1 2017
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: Aug 6 2017Aug 11 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume8

Other

Other34th International Conference on Machine Learning, ICML 2017
CountryAustralia
CitySydney
Period8/6/178/11/17

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint Dive into the research topics of 'High-dimensional variance-reduced stochastic gradient expectation-maximization algorithm'. Together they form a unique fingerprint.

Cite this