Q-learning and Pontryagin's minimum principle

Prashant Mehta, Sean Meyn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It has proven to be effective for models with finite state and action space. This paper establishes connections between Q-learning and nonlinear control of continuous-time models with general state space and general action space. The main contributions are summarized as follows. (i) The starting point is the observation that the "Q-function" appearing in Q-learning algorithms is an extension of the Hamiltonian that appears in the Minimum Principle. Based on this observation we introduce the steepest descent Q-learning algorithm to obtain the optimal approximation of the Hamiltonian within a prescribed function class. (ii) A transformation of the optimality equations is performed based on the adjoint of a resolvent operator. This is used to construct a consistent algorithm based on stochastic approximation that requires only causal filtering of observations. (iii) Several examples are presented to illustrate the application of these techniques, including application to distributed control of multi-agent systems.

Original languageEnglish (US)
Title of host publicationProceedings of the 48th IEEE Conference on Decision and Control held jointly with 2009 28th Chinese Control Conference, CDC/CCC 2009
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3598-3605
Number of pages8
ISBN (Print)9781424438716
DOIs
StatePublished - 2009
Event48th IEEE Conference on Decision and Control held jointly with 2009 28th Chinese Control Conference, CDC/CCC 2009 - Shanghai, China
Duration: Dec 15 2009Dec 18 2009

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Other

Other48th IEEE Conference on Decision and Control held jointly with 2009 28th Chinese Control Conference, CDC/CCC 2009
Country/TerritoryChina
CityShanghai
Period12/15/0912/18/09

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Q-learning and Pontryagin's minimum principle'. Together they form a unique fingerprint.

Cite this