TY - JOUR
T1 - The role of local steps in local SGD
AU - Qin, Tiancheng
AU - Etesami, S. Rasoul
AU - Uribe, César A.
N1 - Publisher Copyright:
© 2023 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2023/8/7
Y1 - 2023/8/7
N2 - We consider the distributed stochastic optimization problem where n agents want to minimize a global function given by the sum of agents' local functions and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. datasets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a central node to improve their local optimization tasks. We analyze the effect of local steps on the convergence rate and the communication complexity of Local SGD. In particular, instead of assuming a fixed number of local steps across all communication rounds, we allow the number of local steps during the jth communication round, (Formula presented.), to be different and arbitrary numbers. Our main contribution is to characterize the convergence rate of Local SGD as a function of (Formula presented.) under various settings of strongly convex, convex, and nonconvex local functions, where R is the total number of communication rounds. Based on this characterization, we provide sufficient conditions on the sequence (Formula presented.) such that Local SGD can achieve linear speedup with respect to the number of workers. Furthermore, we propose a new communication strategy with increasing local steps that is superior to constant local steps for strongly convex local functions. On the other hand, for convex and nonconvex local functions, we argue that fixed local steps are the best communication strategy for Local SGD and recover state-of-the-art convergence rate results. Finally, we justify our theoretical results through extensive numerical experiments.
AB - We consider the distributed stochastic optimization problem where n agents want to minimize a global function given by the sum of agents' local functions and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. datasets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a central node to improve their local optimization tasks. We analyze the effect of local steps on the convergence rate and the communication complexity of Local SGD. In particular, instead of assuming a fixed number of local steps across all communication rounds, we allow the number of local steps during the jth communication round, (Formula presented.), to be different and arbitrary numbers. Our main contribution is to characterize the convergence rate of Local SGD as a function of (Formula presented.) under various settings of strongly convex, convex, and nonconvex local functions, where R is the total number of communication rounds. Based on this characterization, we provide sufficient conditions on the sequence (Formula presented.) such that Local SGD can achieve linear speedup with respect to the number of workers. Furthermore, we propose a new communication strategy with increasing local steps that is superior to constant local steps for strongly convex local functions. On the other hand, for convex and nonconvex local functions, we argue that fixed local steps are the best communication strategy for Local SGD and recover state-of-the-art convergence rate results. Finally, we justify our theoretical results through extensive numerical experiments.
KW - Federated learning
KW - distributed optimization
KW - local SGD
UR - http://www.scopus.com/inward/record.url?scp=85166968917&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166968917&partnerID=8YFLogxK
U2 - 10.1080/10556788.2023.2241151
DO - 10.1080/10556788.2023.2241151
M3 - Article
AN - SCOPUS:85166968917
SN - 1055-6788
JO - Optimization Methods and Software
JF - Optimization Methods and Software
ER -