Abstract
This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.
Original language | English (US) |
---|---|
Pages (from-to) | 1-16 |
Number of pages | 16 |
Journal | IEEE Transactions on Automatic Control |
DOIs | |
State | Accepted/In press - 2024 |
Keywords
- Approximation algorithms
- Convergence
- Estimation error
- Heuristic algorithms
- Optimization
- Performance analysis
- policy optimization
- risk-sensitive LQG
- Robust reinforcement learning
- Robustness
ASJC Scopus subject areas
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering