A Reinforcement Learning Look at Risk-Sensitive Linear Quadratic Gaussian Control

Leilei Cui, Tamer Başar, Zhong Ping Jiang

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm.

Original languageEnglish (US)
Pages (from-to)534-546
Number of pages13
JournalProceedings of Machine Learning Research
Volume211
StatePublished - 2023
Event5th Annual Conference on Learning for Dynamics and Control, L4DC 2023 - Philadelphia, United States
Duration: Jun 15 2023Jun 16 2023

Keywords

  • Robust reinforcement learning
  • input-to-state stability (ISS)
  • policy optimization (PO)

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'A Reinforcement Learning Look at Risk-Sensitive Linear Quadratic Gaussian Control'. Together they form a unique fingerprint.

Cite this