TY - JOUR
T1 - CSER
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
AU - Xie, Cong
AU - Zheng, Shuai
AU - Koyejo, Oluwasanmi
AU - Gupta, Indranil
AU - Li, Mu
AU - Lin, Haibin
N1 - Funding Information:
This work was funded in part by the following grants: NSF IIS 1909577, NSF CNS 1908888, NSF CCF 1934986 and a JP Morgan Chase Fellowship, along with computational resources donated by Intel, AWS, and Microsoft Azure.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communicationefficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called “error reset” that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly 10× for CIFAR-100, and by 4.5× for ImageNet.
AB - The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communicationefficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called “error reset” that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly 10× for CIFAR-100, and by 4.5× for ImageNet.
UR - http://www.scopus.com/inward/record.url?scp=85107969427&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107969427&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85107969427
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 6 December 2020 through 12 December 2020
ER -