Performance Bounds for Policy-Based Reinforcement Learning Methods in Zero-Sum Markov Games with Linear Function Approximation

Anna Winnicki, R. Srikant

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Until recently, efficient policy iteration algorithms for zero-sum Markov games that converge were unknown. Therefore, model-based RL algorithms for such problems could not use policy iteration in the planning modules of the algorithms. In an earlier paper, we showed that a convergent policy iteration algorithm can be obtained by using a commonly used technique in RL called lookahead. However, the algorithm could be applied to the function approximation setting only in the special case of linear MDPs (Markov Decision Processes). In this paper, we obtain performance bounds for policy-based RL algorithms for general settings, including one where policy evaluation is performed using noisy samples of (state, action, reward) triplets from a single sample path of a given policy.

Original languageEnglish (US)
Title of host publication2023 62nd IEEE Conference on Decision and Control, CDC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7144-7149
Number of pages6
ISBN (Electronic)9798350301243
DOIs
StatePublished - 2023
Event62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapore
Duration: Dec 13 2023Dec 15 2023

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference62nd IEEE Conference on Decision and Control, CDC 2023
Country/TerritorySingapore
CitySingapore
Period12/13/2312/15/23

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Performance Bounds for Policy-Based Reinforcement Learning Methods in Zero-Sum Markov Games with Linear Function Approximation'. Together they form a unique fingerprint.

Cite this