TY - GEN
T1 - A Case Study on the Convergence of Direct Policy Search for Linear Quadratic Gaussian Control
AU - Keivan, Darioush
AU - Seiler, Peter
AU - Dullerud, Geir
AU - Bin, H.
N1 - Publisher Copyright:
© 2024 AACC.
PY - 2024
Y1 - 2024
N2 - Policy optimization has gained renewed attention from the control community, serving as a pivotal link between control theory and reinforcement learning. In the past few years, the global convergence theory of direct policy search on state-feedback linear control benchmarks has been developed. However, it remains difficult to establish the global convergence of policy optimization on the linear quadratic Gaussian (LQG) problem, marked by the presence of suboptimal stationary points and the lack of cost coerciveness. In this paper, we revisit the policy optimization intricacies of LQG via a case study on first-order single-input single-output (SISO) systems. For this case study, while the issue related to suboptimal stationary points can be easily fixed via parameterizing the policy class more carefully, the non-coerciveness of the LQG cost function still poses a substantial obstacle to a straightforward global convergence proof for the policy gradient method. Our contribution, within the scope of this case study, introduces an approach to construct a positive invariant set for the policy gradient flow, addressing the non-coerciveness issue in the global convergence proof. Based on our analysis, the policy gradient flow can be guaranteed to converge to the globally optimal full-order dynamic controller in this particular scenario. In summary, although centered on a specific case study, our work broadens the comprehension of how the absence of coerciveness impacts LQG policy optimization, highlighting inherent complexities.
AB - Policy optimization has gained renewed attention from the control community, serving as a pivotal link between control theory and reinforcement learning. In the past few years, the global convergence theory of direct policy search on state-feedback linear control benchmarks has been developed. However, it remains difficult to establish the global convergence of policy optimization on the linear quadratic Gaussian (LQG) problem, marked by the presence of suboptimal stationary points and the lack of cost coerciveness. In this paper, we revisit the policy optimization intricacies of LQG via a case study on first-order single-input single-output (SISO) systems. For this case study, while the issue related to suboptimal stationary points can be easily fixed via parameterizing the policy class more carefully, the non-coerciveness of the LQG cost function still poses a substantial obstacle to a straightforward global convergence proof for the policy gradient method. Our contribution, within the scope of this case study, introduces an approach to construct a positive invariant set for the policy gradient flow, addressing the non-coerciveness issue in the global convergence proof. Based on our analysis, the policy gradient flow can be guaranteed to converge to the globally optimal full-order dynamic controller in this particular scenario. In summary, although centered on a specific case study, our work broadens the comprehension of how the absence of coerciveness impacts LQG policy optimization, highlighting inherent complexities.
UR - http://www.scopus.com/inward/record.url?scp=85204436225&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204436225&partnerID=8YFLogxK
U2 - 10.23919/ACC60939.2024.10644427
DO - 10.23919/ACC60939.2024.10644427
M3 - Conference contribution
AN - SCOPUS:85204436225
T3 - Proceedings of the American Control Conference
SP - 3710
EP - 3715
BT - 2024 American Control Conference, ACC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 American Control Conference, ACC 2024
Y2 - 10 July 2024 through 12 July 2024
ER -