TY - GEN
T1 - Zeno++
T2 - 37th International Conference on Machine Learning, ICML 2020
AU - Xie, Cong
AU - Koyejo, Oluwasanmi
AU - Gupta, Indranil
N1 - Publisher Copyright:
Copyright 2020 by the author(s).
PY - 2020
Y1 - 2020
N2 - We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent (SGD) procedure, intended to tolerate Byzantine failures of workers. In contrast to previous work, Zeno++ removes several unrealistic restrictions on worker-server communication, now allowing for fully asynchronous updates from anonymous workers, for arbitrarily stale worker updates, and for the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing Byzantine-tolerant asynchronous SGD algorithms.
AB - We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent (SGD) procedure, intended to tolerate Byzantine failures of workers. In contrast to previous work, Zeno++ removes several unrealistic restrictions on worker-server communication, now allowing for fully asynchronous updates from anonymous workers, for arbitrarily stale worker updates, and for the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing Byzantine-tolerant asynchronous SGD algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85105420474&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105420474&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105420474
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 10426
EP - 10434
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
Y2 - 13 July 2020 through 18 July 2020
ER -