TY - JOUR
T1 - Policy Optimization for Markovian Jump Linear Quadratic Control
T2 - Gradient Method and Global Convergence
AU - Jansch-Porto, Joao Paulo
AU - Hu, Bin
AU - Dullerud, Geir E.
N1 - The works of Joao Paulo Jansch-Porto and Geir E. Dullerud was supported by the NSF under Grant ECCS 19-32735. The work of Bin Hu was supported by the NSF Award CAREER-2048168.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - Recently, policy optimization has received renewed attention from the control community due to various applications in reinforcement learning tasks. In this article, we investigate the global convergence of the gradient method for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static-state feedback controllers and quadratic performance costs. Despite the nonconvexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and smoothness. Based on these properties, we prove that the gradient method converges to the optimal-state feedback controller for MJLS at a linear rate if initialized at a controller, which is mean-square stabilizing. This article brings new insights for understanding the performance of the policy gradient method on the Markovian jump linear quadratic control problem.
AB - Recently, policy optimization has received renewed attention from the control community due to various applications in reinforcement learning tasks. In this article, we investigate the global convergence of the gradient method for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static-state feedback controllers and quadratic performance costs. Despite the nonconvexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and smoothness. Based on these properties, we prove that the gradient method converges to the optimal-state feedback controller for MJLS at a linear rate if initialized at a controller, which is mean-square stabilizing. This article brings new insights for understanding the performance of the policy gradient method on the Markovian jump linear quadratic control problem.
KW - Markovian jump linear systems (MJLS)
KW - optimal control
KW - policy gradient methods
KW - reinforcement learning (RL)
UR - http://www.scopus.com/inward/record.url?scp=85130449174&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130449174&partnerID=8YFLogxK
U2 - 10.1109/TAC.2022.3176439
DO - 10.1109/TAC.2022.3176439
M3 - Article
AN - SCOPUS:85130449174
SN - 0018-9286
VL - 68
SP - 2475
EP - 2482
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 4
ER -