Abstract
Compressed Stochastic Gradient Descent (SGD) algorithms have been proposed to address the communication bottleneck in distributed and decentralized optimization problems such as federated machine learning. Many existing compressed SGD algorithms use non-adaptive step-sizes (constant or diminishing) to provide theoretical convergence guarantees. Since non-adaptive step-sizes typically involve unknown system parameters, the step-sizes are fine-tuned in practice to obtain good empirical performance for the given dataset and learning model. Such fine-tuning might be impractical in many scenarios. Thus, it is of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work that use adaptive step-sizes for uncompressed SGD, we develop an Armijo rule based step-size selection method for compressed SGD. In particular, we introduce a scaling technique for the descent step, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition, and for non-convex objectives under a strong growth condition. We present experimental results on deep neural networks trained on real-world datasets, and compare the performance of our proposed algorithm with state-of-the-art compressed SGD methods to demonstrate improved performance at various levels of compression.
Original language | English (US) |
---|---|
Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
DOIs | |
State | Published - 2023 |
Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: Jun 4 2023 → Jun 10 2023 |
Keywords
- Adaptive step-size
- Armijo rule
- compressed SGD
- federated learning
- scaling
- stochastic gradient descent
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering