TY - GEN
T1 - The dependence of effective planning horizon on model accuracy
AU - Jiang, Nan
AU - Kulesza, Alex
AU - Singh, Satinder
AU - Lewis, Richard
N1 - Publisher Copyright:
Copyright © 2015, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
PY - 2015
Y1 - 2015
N2 - For Markov decision processes with long horizons (i.e., discount factors close to one), it is common in practice to use reduced horizons during planning to speed computation. However, perhaps surprisingly, when the model available to the agent is estimated from data, as will be the case in most real-world problems, the policy found using a shorter planning horizon can actually be better than a policy learned with the true horizon. In this paper we provide a precise explanation for this phenomenon based on principles of learning theory. We show formally that the planning horizon is a complexity control parameter for the class of policies to be learned. In particular, it has an intuitive, monotonic relationship with a simple counting measure of complexity, and that a similar relationship can be observed empirically with a more general and data-dependent Rademacher complexity measure. Each complexity measure gives rise to a bound on the planning loss predicting that a planning horizon shorter than the true horizon can reduce overfitting and improve test performance, and we confirm these predictions empirically.
AB - For Markov decision processes with long horizons (i.e., discount factors close to one), it is common in practice to use reduced horizons during planning to speed computation. However, perhaps surprisingly, when the model available to the agent is estimated from data, as will be the case in most real-world problems, the policy found using a shorter planning horizon can actually be better than a policy learned with the true horizon. In this paper we provide a precise explanation for this phenomenon based on principles of learning theory. We show formally that the planning horizon is a complexity control parameter for the class of policies to be learned. In particular, it has an intuitive, monotonic relationship with a simple counting measure of complexity, and that a similar relationship can be observed empirically with a more general and data-dependent Rademacher complexity measure. Each complexity measure gives rise to a bound on the planning loss predicting that a planning horizon shorter than the true horizon can reduce overfitting and improve test performance, and we confirm these predictions empirically.
KW - Discount factor
KW - Over-fitting
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=84945187448&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84945187448&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84945187448
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 1181
EP - 1189
BT - AAMAS 2015 - Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems
A2 - Elkind, Edith
A2 - Weiss, Gerhard
A2 - Yolum, Pinar
A2 - Bordini, Rafael H.
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
T2 - 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015
Y2 - 4 May 2015 through 8 May 2015
ER -