TY - GEN
T1 - Noisy Truncated SGD
T2 - 2022 SIAM International Conference on Data Mining, SDM 2022
AU - Zhou, Yingxue
AU - Li, Xinyan
AU - Banerjee, Arindam
N1 - In this paper, we analyze Truncated SGD, a sparse gradient algorithm that reduces small gradient components to zeros, and propose Noisy Truncated SGD, a perturbed version of T-SGD that adds Gaussian noise to all components after gradient truncation. We establish the rate of convergence for T-SGD and NT-SGD, and prove that NT-SGD can escape from saddle points with the help of a small amount of injected Gaussian noise. We also derive generalization error bounds of T-SGD and NT-SGD based on uniform stability. We demonstrate that NT-SGD has better generalization error bound compared to T-SGD with considerably improved stability due to the noise. Empirical evidence demonstrates that both T-SGD and NT-SGD matches the speed and accuracy of vanilla SGD, and NT-SGD can successfully escape sharp minima, supporting our theoretical analysis. Acknowledgements. The research was supported by NSF grants IIS-1908104, OAC-1934634, IIS-1563950, and a C3.ai research award. We would like to thank the Minnesota Super-computing Institute (MSI) for providing computational resources and support.
PY - 2022
Y1 - 2022
N2 - Recent empirical work on stochastic gradient descent (SGD) applied to over-parameterized deep learning has shown that most gradient components over epochs are quite small. Inspired by such observations, we rigorously study properties of Truncated SGD (T-SGD), that truncates the majority of small gradient components to zeros. Considering non-convex optimization problems, we show that the convergence rate of T-SGD matches the order of vanilla SGD. We also establish the generalization error bound for T-SGD. Further, we propose Noisy Truncated SGD (NT-SGD), which adds Gaussian noise to the truncated gradients. We prove that NT-SGD has the same convergence rate as T-SGD for non-convex optimization problems. We demonstrate that with the help of noise, NT-SGD can provably escape from saddle points and requires less noise compared to previous related work. We also prove that NT-SGD achieves better generalization error bound compared to T-SGD because of the noise. Our generalization analysis is based on uniform stability and we show that additional noise in the gradient update can boost the stability. Our experiments on a variety of benchmark datasets with various network architectures verify that NT-SGD matches the speed and accuracy of vanilla SGD while effectively working with sparse gradients, and can successfully escape poor local minima.
AB - Recent empirical work on stochastic gradient descent (SGD) applied to over-parameterized deep learning has shown that most gradient components over epochs are quite small. Inspired by such observations, we rigorously study properties of Truncated SGD (T-SGD), that truncates the majority of small gradient components to zeros. Considering non-convex optimization problems, we show that the convergence rate of T-SGD matches the order of vanilla SGD. We also establish the generalization error bound for T-SGD. Further, we propose Noisy Truncated SGD (NT-SGD), which adds Gaussian noise to the truncated gradients. We prove that NT-SGD has the same convergence rate as T-SGD for non-convex optimization problems. We demonstrate that with the help of noise, NT-SGD can provably escape from saddle points and requires less noise compared to previous related work. We also prove that NT-SGD achieves better generalization error bound compared to T-SGD because of the noise. Our generalization analysis is based on uniform stability and we show that additional noise in the gradient update can boost the stability. Our experiments on a variety of benchmark datasets with various network architectures verify that NT-SGD matches the speed and accuracy of vanilla SGD while effectively working with sparse gradients, and can successfully escape poor local minima.
UR - http://www.scopus.com/inward/record.url?scp=85131302971&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131302971&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85131302971
T3 - Proceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022
SP - 468
EP - 476
BT - Proceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 28 April 2022 through 30 April 2022
ER -