TY - GEN
T1 - Convex Programs and Lyapunov Functions for Reinforcement Learning
T2 - 2022 American Control Conference, ACC 2022
AU - Guo, Xingang
AU - Hu, Bin
N1 - ACKNOWLEDGMENT This work is generously supported by the NSF award CAREER-2048168 and the 2020 Amazon research award.
PY - 2022
Y1 - 2022
N2 - Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.
AB - Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.
UR - http://www.scopus.com/inward/record.url?scp=85138490665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138490665&partnerID=8YFLogxK
U2 - 10.23919/ACC53348.2022.9867291
DO - 10.23919/ACC53348.2022.9867291
M3 - Conference contribution
AN - SCOPUS:85138490665
T3 - Proceedings of the American Control Conference
SP - 3317
EP - 3322
BT - 2022 American Control Conference, ACC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 June 2022 through 10 June 2022
ER -