Complexity of Derivative-Free Policy Optimization for Structured H Control

Xingang Guo, Darioush Keivan, Geir Dullerud, Peter Seiler, Bin Hu

Research output: Contribution to journalConference articlepeer-review

Abstract

The applications of direct policy search in reinforcement learning and continuous control have received increasing attention. In this work, we present novel theoretical results on the complexity of derivative-free policy optimization on an important class of robust control tasks, namely the structured H synthesis with static output feedback. Optimal H synthesis under structural constraints leads to a constrained nonconvex nonsmooth problem and is typically addressed using subgradient-based policy search techniques that are built upon the concept of Goldstein subdifferential or other notions of enlarged subdifferential. In this paper, we study the complexity of finding (δ, ϵ)-stationary points for such nonsmooth robust control design tasks using policy optimization methods which can only access the zeroth-order oracle (i.e. the H norm of the closed-loop system). First, we study the exact oracle setting and identify the coerciveness of the cost function to prove high-probability feasibility/complexity bounds for derivative-free policy optimization on this problem. Next, we derive a sample complexity result for the multi-input multi-output (MIMO) H-norm estimation. We combine this with our analysis to obtain the first sample complexity of model-free, trajectory-based, zeroth-order policy optimization on finding (δ, ϵ)-stationary points for structured H control. Numerical results are also provided to demonstrate our theory.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume36
StatePublished - 2023
Externally publishedYes
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Complexity of Derivative-Free Policy Optimization for Structured H Control'. Together they form a unique fingerprint.

Cite this