TY - JOUR
T1 - Policy Optimization For H2 Linear Control With HRobustness Guarantee
T2 - Implicit Regularization And Global Convergence
AU - Zhang, Kaiqing
AU - Hu, Bin
AU - Basar, Tamer
N1 - Publisher Copyright:
© 2021 Society for Industrial and Applied Mathematics Publications. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Policy optimization (PO) is a key ingredient for modern reinforcement learning. For control design, certain constraints are usually enforced on the policies to optimize, accounting for stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the H-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for H2 linear control with H robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard H2 linear control, namely, linear quadratic regulator problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the H robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the H robustness automatically, as if they are regularized. Furthermore, despite the nonconvexity of the problem, we show that these algorithms converge to a certain globally optimal policy with globally sublinear rates, without getting stuck at any other possibly suboptimal stationary points, and with locally (super)linear rates under additional conditions. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.
AB - Policy optimization (PO) is a key ingredient for modern reinforcement learning. For control design, certain constraints are usually enforced on the policies to optimize, accounting for stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the H-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for H2 linear control with H robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard H2 linear control, namely, linear quadratic regulator problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the H robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the H robustness automatically, as if they are regularized. Furthermore, despite the nonconvexity of the problem, we show that these algorithms converge to a certain globally optimal policy with globally sublinear rates, without getting stuck at any other possibly suboptimal stationary points, and with locally (super)linear rates under additional conditions. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.
KW - global convergence
KW - implicit regularization
KW - learning for control
KW - policy optimization
KW - robust control
UR - http://www.scopus.com/inward/record.url?scp=85121022765&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121022765&partnerID=8YFLogxK
U2 - 10.1137/20M1347942
DO - 10.1137/20M1347942
M3 - Article
AN - SCOPUS:85121022765
SN - 0363-0129
VL - 59
SP - 4081
EP - 4109
JO - SIAM Journal on Control and Optimization
JF - SIAM Journal on Control and Optimization
IS - 6
ER -