On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs

Yashaswini Murthy, R. Srikant

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It is now well known that Natural Policy Gradient (NPG) globally converges for discounted-reward MDPs in the tabular setting, with perfect value function estimates. However, the result cannot be directly used to obtain a corresponding convergence result for average-reward MDPs by letting the discount factor tend to one. In this paper, we prove that NPG also converges for average-reward MDPs in which each policy leads to an irreducible Markov chain. Since NPG can also be interpreted as a mirror descent based policy method, we then discuss extensions to non-tabular settings for mirror descent-based methods.

Original languageEnglish (US)
Title of host publication2023 62nd IEEE Conference on Decision and Control, CDC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1979-1984
Number of pages6
ISBN (Electronic)9798350301243
DOIs
StatePublished - 2023
Event62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapore
Duration: Dec 13 2023Dec 15 2023

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference62nd IEEE Conference on Decision and Control, CDC 2023
Country/TerritorySingapore
CitySingapore
Period12/13/2312/15/23

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs'. Together they form a unique fingerprint.

Cite this