The dependence of effective planning horizon on model accuracy

Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For Markov decision processes with long horizons (i.e., discount factors close to one), it is common in practice to use reduced horizons during planning to speed computation. However, perhaps surprisingly, when the model available to the agent is estimated from data, as will be the case in most real-world problems, the policy found using a shorter planning horizon can actually be better than a policy learned with the true horizon. In this paper we provide a precise explanation for this phenomenon based on principles of learning theory. We show formally that the planning horizon is a complexity control parameter for the class of policies to be learned. In particular, it has an intuitive, monotonic relationship with a simple counting measure of complexity, and that a similar relationship can be observed empirically with a more general and data-dependent Rademacher complexity measure. Each complexity measure gives rise to a bound on the planning loss predicting that a planning horizon shorter than the true horizon can reduce overfitting and improve test performance, and we confirm these predictions empirically.

Original languageEnglish (US)
Title of host publicationAAMAS 2015 - Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems
EditorsEdith Elkind, Gerhard Weiss, Pinar Yolum, Rafael H. Bordini
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages1181-1189
Number of pages9
ISBN (Electronic)9781450337700
StatePublished - 2015
Externally publishedYes
Event14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015 - Istanbul, Turkey
Duration: May 4 2015May 8 2015

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Other

Other14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015
Country/TerritoryTurkey
CityIstanbul
Period5/4/155/8/15

Keywords

  • Discount factor
  • Over-fitting
  • Reinforcement learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'The dependence of effective planning horizon on model accuracy'. Together they form a unique fingerprint.

Cite this