TY - GEN
T1 - Zeno
T2 - 36th International Conference on Machine Learning, ICML 2019
AU - Xie, Cong
AU - Koyejo, Oluwasanmi
AU - Gupta, Indranil
N1 - Publisher Copyright:
Copyright © 2019 ASME
PY - 2019
Y1 - 2019
N2 - We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.
AB - We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.
UR - http://www.scopus.com/inward/record.url?scp=85078259889&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078259889&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85078259889
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 11928
EP - 11944
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
Y2 - 9 June 2019 through 15 June 2019
ER -