Policy Optimization For H2 Linear Control With HRobustness Guarantee: Implicit Regularization And Global Convergence

Kaiqing Zhang, Bin Hu, Tamer Basar

Research output: Contribution to journalArticlepeer-review

Abstract

Policy optimization (PO) is a key ingredient for modern reinforcement learning. For control design, certain constraints are usually enforced on the policies to optimize, accounting for stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the H-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for H2 linear control with H robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard H2 linear control, namely, linear quadratic regulator problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the H robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the H robustness automatically, as if they are regularized. Furthermore, despite the nonconvexity of the problem, we show that these algorithms converge to a certain globally optimal policy with globally sublinear rates, without getting stuck at any other possibly suboptimal stationary points, and with locally (super)linear rates under additional conditions. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.

Original languageEnglish (US)
Pages (from-to)4081-4109
Number of pages29
JournalSIAM Journal on Control and Optimization
Volume59
Issue number6
DOIs
StatePublished - 2021

Keywords

  • global convergence
  • implicit regularization
  • learning for control
  • policy optimization
  • robust control

ASJC Scopus subject areas

  • Control and Optimization
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Policy Optimization For H2 Linear Control With HRobustness Guarantee: Implicit Regularization And Global Convergence'. Together they form a unique fingerprint.

Cite this