On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions

Aaron Havens, Bin Hu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When applying imitation learning techniques to fit a policy from expert demonstrations, one can take advantage of prior stability/robustness assumptions on the expert's policy and incorporate such control-theoretic prior knowledge explicitly into the learning process. In this paper, we formulate the imitation learning of linear policies as a constrained optimization problem, and present efficient methods which can be used to enforce stability and robustness constraints during the learning processes. Specifically, we show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy. Then both the projected gradient descent method and the alternating direction method of multipliers (ADMM) method can be applied to solve the resultant constrained policy fitting problem. Finally, we provide numerical results to demonstrate the effectiveness of our methods in producing linear polices with various stability and robustness guarantees.

Original languageEnglish (US)
Title of host publication2021 American Control Conference, ACC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages882-887
Number of pages6
ISBN (Electronic)9781665441971
DOIs
StatePublished - May 25 2021
Event2021 American Control Conference, ACC 2021 - Virtual, New Orleans, United States
Duration: May 25 2021May 28 2021

Publication series

NameProceedings of the American Control Conference
Volume2021-May
ISSN (Print)0743-1619

Conference

Conference2021 American Control Conference, ACC 2021
Country/TerritoryUnited States
CityVirtual, New Orleans
Period5/25/215/28/21

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'On Imitation Learning of Linear Control Policies: Enforcing Stability and Robustness Constraints via LMI Conditions'. Together they form a unique fingerprint.

Cite this