Noisy Truncated SGD: Optimization and Generalization

Yingxue Zhou, Xinyan Li, Arindam Banerjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent empirical work on stochastic gradient descent (SGD) applied to over-parameterized deep learning has shown that most gradient components over epochs are quite small. Inspired by such observations, we rigorously study properties of Truncated SGD (T-SGD), that truncates the majority of small gradient components to zeros. Considering non-convex optimization problems, we show that the convergence rate of T-SGD matches the order of vanilla SGD. We also establish the generalization error bound for T-SGD. Further, we propose Noisy Truncated SGD (NT-SGD), which adds Gaussian noise to the truncated gradients. We prove that NT-SGD has the same convergence rate as T-SGD for non-convex optimization problems. We demonstrate that with the help of noise, NT-SGD can provably escape from saddle points and requires less noise compared to previous related work. We also prove that NT-SGD achieves better generalization error bound compared to T-SGD because of the noise. Our generalization analysis is based on uniform stability and we show that additional noise in the gradient update can boost the stability. Our experiments on a variety of benchmark datasets with various network architectures verify that NT-SGD matches the speed and accuracy of vanilla SGD while effectively working with sparse gradients, and can successfully escape poor local minima.

Original languageEnglish (US)
Title of host publicationProceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022
PublisherSociety for Industrial and Applied Mathematics Publications
Pages468-476
Number of pages9
ISBN (Electronic)9781611977172
StatePublished - 2022
Externally publishedYes
Event2022 SIAM International Conference on Data Mining, SDM 2022 - Virtual, Online
Duration: Apr 28 2022Apr 30 2022

Publication series

NameProceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022

Conference

Conference2022 SIAM International Conference on Data Mining, SDM 2022
CityVirtual, Online
Period4/28/224/30/22

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Noisy Truncated SGD: Optimization and Generalization'. Together they form a unique fingerprint.

Cite this