Towards finite-sample convergence of direct reinforcement learning

Shiau Hong Lim, Gerald DeJong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages230-241
Number of pages12
DOIs
StatePublished - 2005
Externally publishedYes
Event16th European Conference on Machine Learning, ECML 2005 - Porto, Portugal
Duration: Oct 3 2005Oct 7 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3720 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th European Conference on Machine Learning, ECML 2005
Country/TerritoryPortugal
CityPorto
Period10/3/0510/7/05

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Towards finite-sample convergence of direct reinforcement learning'. Together they form a unique fingerprint.

Cite this