TY - JOUR
T1 - Complexity of Derivative-Free Policy Optimization for Structured H∞ Control
AU - Guo, Xingang
AU - Keivan, Darioush
AU - Dullerud, Geir
AU - Seiler, Peter
AU - Hu, Bin
N1 - The work of Xingang Guo and Bin Hu is generously supported by the NSF award CAREER-2048168 and the IBM/IIDAI award 110662-01. The work of Darioush Kevian and Geir Dullerud was supported by NSF under Grant ECCS 1932735. The work of Peter Seiler was supported by the U.S. Office of Naval Research (ONR) under Grant N00014-18-1-2209.
PY - 2023
Y1 - 2023
N2 - The applications of direct policy search in reinforcement learning and continuous control have received increasing attention. In this work, we present novel theoretical results on the complexity of derivative-free policy optimization on an important class of robust control tasks, namely the structured H∞ synthesis with static output feedback. Optimal H∞ synthesis under structural constraints leads to a constrained nonconvex nonsmooth problem and is typically addressed using subgradient-based policy search techniques that are built upon the concept of Goldstein subdifferential or other notions of enlarged subdifferential. In this paper, we study the complexity of finding (δ, ϵ)-stationary points for such nonsmooth robust control design tasks using policy optimization methods which can only access the zeroth-order oracle (i.e. the H∞ norm of the closed-loop system). First, we study the exact oracle setting and identify the coerciveness of the cost function to prove high-probability feasibility/complexity bounds for derivative-free policy optimization on this problem. Next, we derive a sample complexity result for the multi-input multi-output (MIMO) H∞-norm estimation. We combine this with our analysis to obtain the first sample complexity of model-free, trajectory-based, zeroth-order policy optimization on finding (δ, ϵ)-stationary points for structured H∞ control. Numerical results are also provided to demonstrate our theory.
AB - The applications of direct policy search in reinforcement learning and continuous control have received increasing attention. In this work, we present novel theoretical results on the complexity of derivative-free policy optimization on an important class of robust control tasks, namely the structured H∞ synthesis with static output feedback. Optimal H∞ synthesis under structural constraints leads to a constrained nonconvex nonsmooth problem and is typically addressed using subgradient-based policy search techniques that are built upon the concept of Goldstein subdifferential or other notions of enlarged subdifferential. In this paper, we study the complexity of finding (δ, ϵ)-stationary points for such nonsmooth robust control design tasks using policy optimization methods which can only access the zeroth-order oracle (i.e. the H∞ norm of the closed-loop system). First, we study the exact oracle setting and identify the coerciveness of the cost function to prove high-probability feasibility/complexity bounds for derivative-free policy optimization on this problem. Next, we derive a sample complexity result for the multi-input multi-output (MIMO) H∞-norm estimation. We combine this with our analysis to obtain the first sample complexity of model-free, trajectory-based, zeroth-order policy optimization on finding (δ, ϵ)-stationary points for structured H∞ control. Numerical results are also provided to demonstrate our theory.
UR - http://www.scopus.com/inward/record.url?scp=85181560344&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181560344&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85181560344
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -