Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Leilei Cui, Tamer Basar, Zhong Ping Jiang

Research output: Contribution to journalArticlepeer-review

Abstract

This article proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.

Original languageEnglish (US)
Pages (from-to)7678-7693
Number of pages16
JournalIEEE Transactions on Automatic Control
Volume69
Issue number11
DOIs
StatePublished - 2024

Keywords

  • Policy optimization (PO)
  • risk-sensitive linear quadratic Gaussian (LQG)
  • robust reinforcement learning

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control'. Together they form a unique fingerprint.

Cite this