Abstract
The applications of direct policy search in reinforcement learning and continuous control have received increasing attention. In this work, we present novel theoretical results on the complexity of derivative-free policy optimization on an important class of robust control tasks, namely the structured H∞ synthesis with static output feedback. Optimal H∞ synthesis under structural constraints leads to a constrained nonconvex nonsmooth problem and is typically addressed using subgradient-based policy search techniques that are built upon the concept of Goldstein subdifferential or other notions of enlarged subdifferential. In this paper, we study the complexity of finding (δ, ϵ)-stationary points for such nonsmooth robust control design tasks using policy optimization methods which can only access the zeroth-order oracle (i.e. the H∞ norm of the closed-loop system). First, we study the exact oracle setting and identify the coerciveness of the cost function to prove high-probability feasibility/complexity bounds for derivative-free policy optimization on this problem. Next, we derive a sample complexity result for the multi-input multi-output (MIMO) H∞-norm estimation. We combine this with our analysis to obtain the first sample complexity of model-free, trajectory-based, zeroth-order policy optimization on finding (δ, ϵ)-stationary points for structured H∞ control. Numerical results are also provided to demonstrate our theory.
Original language | English (US) |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 36 |
State | Published - 2023 |
Externally published | Yes |
Event | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: Dec 10 2023 → Dec 16 2023 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing