TY - JOUR
T1 - Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control
AU - Cui, Leilei
AU - Basar, Tamer
AU - Jiang, Zhong Ping
N1 - Manuscript received 11 December 2023; revised 12 March 2024; accepted 21 April 2024. Date of publication 7 May 2024; date of current version 25 October 2024. This work was supported in part by NSF under Grant CNS-2148309 and Grant ECCS-2210320 and in part by ARO under Grant W911NF-24-1-0085. Recommended by Associate Editor Z. Shu. (Corresponding author: Leilei Cui.) Leilei Cui and Zhong-Ping Jiang are with the Control and Networks Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY 11201 USA (e-mail: [email protected]; [email protected]).
PY - 2024
Y1 - 2024
N2 - This article proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.
AB - This article proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.
KW - Policy optimization (PO)
KW - risk-sensitive linear quadratic Gaussian (LQG)
KW - robust reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85192998928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192998928&partnerID=8YFLogxK
U2 - 10.1109/TAC.2024.3397928
DO - 10.1109/TAC.2024.3397928
M3 - Article
AN - SCOPUS:85192998928
SN - 0018-9286
VL - 69
SP - 7678
EP - 7693
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 11
ER -