Stochastic gradient descent is a common algorithm used in machine learning, especially when the loss function is in a separable form. Here, we consider SGD for constrained convex optimization problems, where the constraint is expressed as an intersection of a large number of convex sets. It is expensive to project the result of the gradient descent algorithm on each of these convex sets at each iteration; thus, there is a growing body of work which considers projections onto a random subset of the convex sets. In this paper, we consider a distributed version (parameter-server model) of the random projections since a centralized approach is too slow for very large-scale problems. Our main result is as follows: under a mild regularity condition on the convex sets, we show that the rate of convergence of distributed SGD with distributed random projections is the same as that of distributed SGD applied to a problem with no constraints, except for a factor which captures the regularity assumption.
- Distributed optimization
- random projections
ASJC Scopus subject areas
- Control and Systems Engineering