The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping

Adam Laud, Gerald DeJong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a reward horizon, which focuses exploration on an MDP's critical region, a subset of states with the property that any policy that performs well on the critical region also performs well on the MDP. We provide a simple algorithm and prove that its learning time is polynomial in the size of the critical region and, crucially, independent of the size of the MDP. This identifies low reward horizons with easy-to-learn MDPs. Shaping rewards, which encode our prior knowledge about the relative merits of decisions, can be seen as artificially reducing the MDP's natural reward horizon. We demonstrate empirically the effects of using shaping to reduce the reward horizon.

Original languageEnglish (US)
Title of host publicationProceedings, Twentieth International Conference on Machine Learning
EditorsT. Fawcett, N. Mishra
Pages440-447
Number of pages8
Volume1
StatePublished - 2003
EventProceedings, Twentieth International Conference on Machine Learning - Washington, DC, United States
Duration: Aug 21 2003Aug 24 2003

Other

OtherProceedings, Twentieth International Conference on Machine Learning
CountryUnited States
CityWashington, DC
Period8/21/038/24/03

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping'. Together they form a unique fingerprint.

Cite this