Qualitative reinforcement learning

Arkady Epshteyn, Gerald DeJong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.

Original languageEnglish (US)
Title of host publicationICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
Pages305-312
Number of pages8
StatePublished - Oct 6 2006
EventICML 2006: 23rd International Conference on Machine Learning - Pittsburgh, PA, United States
Duration: Jun 25 2006Jun 29 2006

Publication series

NameICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
Volume2006

Other

OtherICML 2006: 23rd International Conference on Machine Learning
Country/TerritoryUnited States
CityPittsburgh, PA
Period6/25/066/29/06

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Qualitative reinforcement learning'. Together they form a unique fingerprint.

Cite this