Faster Convergence of Local SGD for Over-Parameterized Models

Research output: Contribution to journalArticlepeer-review

Abstract

Modern machine learning architectures are often highly expressive. They are usually over- parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general convex loss functions, we establish an error bound of O (1 /T) under a mild data similarity assumption and an error bound of O (K/T) otherwise, where K is the number of local steps and T is the total number of iterations. For non-convex loss functions we prove an error bound of O (K/T). These bounds improve upon the best previous bound of O (1 / v nT) in both cases, where n is the number of nodes, when no assumption on the model being over-parameterized is made. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize. Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the O (1 /T) convergence rate of Local SGD is clearly shown.

Original languageEnglish (US)
JournalTransactions on Machine Learning Research
Volume2024
StatePublished - 2024

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Faster Convergence of Local SGD for Over-Parameterized Models'. Together they form a unique fingerprint.

Cite this