Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

Anna Winnicki, R. Srikant

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm is a two-time-scale algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.

Original languageEnglish (US)
Title of host publication2022 IEEE 61st Conference on Decision and Control, CDC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages801-806
Number of pages6
ISBN (Electronic)9781665467612
DOIs
StatePublished - 2022
Event61st IEEE Conference on Decision and Control, CDC 2022 - Cancun, Mexico
Duration: Dec 6 2022Dec 9 2022

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2022-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference61st IEEE Conference on Decision and Control, CDC 2022
Country/TerritoryMexico
CityCancun
Period12/6/2212/9/22

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation'. Together they form a unique fingerprint.

Cite this