TY - JOUR
T1 - Adaptive Step-Size Methods for Compressed SGD With Memory Feedback
AU - Subramaniam, Adarsh M.
AU - Magesh, Akshayaa
AU - Veeravalli, Venugopal V.
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Distributed and decentralized optimization problems, such as those encountered in federated learning, often suffer from a communication bottleneck at compute nodes when a large number of messages get aggregated. To address this communication bottleneck, compressed Stochastic Gradient Descent (SGD) algorithms have been proposed. Most existing compressed SGD algorithms use non-adaptive step-sizes (constant or diminishing) that depend on unknown system parameters to provide theoretical convergence guarantees, with the step-sizes being fine-tuned in practice to obtain good empirical performance for a given dataset and learning model. Such fine-tuning might be impractical in many scenarios. Therefore, it is of interest to study compressed SGD using adaptive step-sizes that do not depend on unknown system parameters. Motivated by prior work that employs adaptive step-sizes for uncompressed SGD, we develop an Armijo rule based step-size selection method for compressed SGD with feedback. In particular, we first motivate a novel scaling technique for Gradient Descent (GD) with Armijo step-size search, based on which we develop an Armijo step-search method with scaling for the descent step of the compressed SGD algorithm with memory feedback. We then establish that our algorithm has order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition, and for non-convex objectives under a strong growth condition. Experiments comparing our proposed algorithm with state-of-the-art compressed SGD methods with memory feedback demonstrate a notable improvement in terms of training loss across various levels of compression.
AB - Distributed and decentralized optimization problems, such as those encountered in federated learning, often suffer from a communication bottleneck at compute nodes when a large number of messages get aggregated. To address this communication bottleneck, compressed Stochastic Gradient Descent (SGD) algorithms have been proposed. Most existing compressed SGD algorithms use non-adaptive step-sizes (constant or diminishing) that depend on unknown system parameters to provide theoretical convergence guarantees, with the step-sizes being fine-tuned in practice to obtain good empirical performance for a given dataset and learning model. Such fine-tuning might be impractical in many scenarios. Therefore, it is of interest to study compressed SGD using adaptive step-sizes that do not depend on unknown system parameters. Motivated by prior work that employs adaptive step-sizes for uncompressed SGD, we develop an Armijo rule based step-size selection method for compressed SGD with feedback. In particular, we first motivate a novel scaling technique for Gradient Descent (GD) with Armijo step-size search, based on which we develop an Armijo step-search method with scaling for the descent step of the compressed SGD algorithm with memory feedback. We then establish that our algorithm has order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition, and for non-convex objectives under a strong growth condition. Experiments comparing our proposed algorithm with state-of-the-art compressed SGD methods with memory feedback demonstrate a notable improvement in terms of training loss across various levels of compression.
KW - Adaptive step-sizes
KW - compression
KW - distributed optimization
KW - memory feedback
KW - stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85190171618&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190171618&partnerID=8YFLogxK
U2 - 10.1109/TSP.2024.3385577
DO - 10.1109/TSP.2024.3385577
M3 - Article
AN - SCOPUS:85190171618
SN - 1053-587X
VL - 72
SP - 2394
EP - 2406
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
ER -