TY - JOUR
T1 - Analysis of biased stochastic gradient descent using sequential semidefinite programs
AU - Hu, Bin
AU - Seiler, Peter
AU - Lessard, Laurent
N1 - Funding Information:
This work is supported by the NSF Awards 1656951, 1750162, 1254129, and the NASA Langley NRA Cooperative Agreement NNX12AM55A. Bin Hu and Laurent Lessard also acknowledge support from the Wisconsin Institute for Discovery, the College of Engineering, and the Department of Electrical and Computer Engineering at the University of Wisconsin–Madison.
Publisher Copyright:
© 2020, Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2021/5
Y1 - 2021/5
N2 - We present a convergence rate analysis for biased stochastic gradient descent (SGD), where individual gradient updates are corrupted by computation errors. We develop stochastic quadratic constraints to formulate a small linear matrix inequality (LMI) whose feasible points lead to convergence bounds of biased SGD. Based on this LMI condition, we develop a sequential minimization approach to analyze the intricate trade-offs that couple stepsize selection, convergence rate, optimization accuracy, and robustness to gradient inaccuracy. We also provide feasible points for this LMI and obtain theoretical formulas that quantify the convergence properties of biased SGD under various assumptions on the loss functions.
AB - We present a convergence rate analysis for biased stochastic gradient descent (SGD), where individual gradient updates are corrupted by computation errors. We develop stochastic quadratic constraints to formulate a small linear matrix inequality (LMI) whose feasible points lead to convergence bounds of biased SGD. Based on this LMI condition, we develop a sequential minimization approach to analyze the intricate trade-offs that couple stepsize selection, convergence rate, optimization accuracy, and robustness to gradient inaccuracy. We also provide feasible points for this LMI and obtain theoretical formulas that quantify the convergence properties of biased SGD under various assumptions on the loss functions.
KW - Biased stochastic gradient
KW - Convergence rates
KW - Convex optimization
KW - First-order methods
KW - Robustness to inexact gradient
UR - http://www.scopus.com/inward/record.url?scp=85083369930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083369930&partnerID=8YFLogxK
U2 - 10.1007/s10107-020-01486-1
DO - 10.1007/s10107-020-01486-1
M3 - Article
AN - SCOPUS:85083369930
SN - 0025-5610
VL - 187
SP - 383
EP - 408
JO - Mathematical Programming
JF - Mathematical Programming
IS - 1-2
ER -