Convergence and optimality of policy gradient primal-dual method for constrained Markov decision processes

Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo R. Jovanovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study constrained Markov decision processes with finite state and action spaces. The optimal solution of a discounted infinite-horizon optimal control problem is obtained using a Policy Gradient Primal-Dual (PG-PD) method without any policy parametrization. This method updates the primal variable via projected policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the lack of concavity of the constrained maximization problem in policy space, we exploit the underlying structure to provide non-asymptotic global convergence guarantees with sublinear rates in terms of both the optimality gap and the constraint violation. Furthermore, for a sample-based PG-PD algorithm, we quantify sample complexity and offer computational experiments to demonstrate the effectiveness of our results.

Original languageEnglish (US)
Title of host publication2022 American Control Conference, ACC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2851-2856
Number of pages6
ISBN (Electronic)9781665451963
DOIs
StatePublished - 2022
Event2022 American Control Conference, ACC 2022 - Atlanta, United States
Duration: Jun 8 2022Jun 10 2022

Publication series

NameProceedings of the American Control Conference
Volume2022-June
ISSN (Print)0743-1619

Conference

Conference2022 American Control Conference, ACC 2022
Country/TerritoryUnited States
CityAtlanta
Period6/8/226/10/22

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Convergence and optimality of policy gradient primal-dual method for constrained Markov decision processes'. Together they form a unique fingerprint.

Cite this