Convergence of Monte Carlo Exploring Starts with TD-Learning

Anna Winnicki, R. Srikant

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The use of TD-learning has been widely employed in reinforcement learning algorithms due to its efficiency and practicality. Herein, we study the convergence of a variant of Monte Carlo Exploring Starts when operatornameTD(λ) is used in policy evaluation and policy improvement, and lookahead is used in the policy improvement step. Our results provide a threshold for the amount of lookahead that ensures convergence of Monte Carlo Exploring Starts with T D(λ) as a function of λ in[0,1].

Original languageEnglish (US)
Title of host publication2024 IEEE 63rd Conference on Decision and Control, CDC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3865-3870
Number of pages6
ISBN (Electronic)9798350316339
DOIs
StatePublished - 2024
Event63rd IEEE Conference on Decision and Control, CDC 2024 - Milan, Italy
Duration: Dec 16 2024Dec 19 2024

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference63rd IEEE Conference on Decision and Control, CDC 2024
Country/TerritoryItaly
CityMilan
Period12/16/2412/19/24

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Convergence of Monte Carlo Exploring Starts with TD-Learning'. Together they form a unique fingerprint.

Cite this