Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods

Xingang Guo, Bin Hu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.

Original languageEnglish (US)
Title of host publication2022 American Control Conference, ACC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3317-3322
Number of pages6
ISBN (Electronic)9781665451963
DOIs
StatePublished - 2022
Event2022 American Control Conference, ACC 2022 - Atlanta, United States
Duration: Jun 8 2022Jun 10 2022

Publication series

NameProceedings of the American Control Conference
Volume2022-June
ISSN (Print)0743-1619

Conference

Conference2022 American Control Conference, ACC 2022
Country/TerritoryUnited States
CityAtlanta
Period6/8/226/10/22

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods'. Together they form a unique fingerprint.

Cite this