TY - JOUR
T1 - On structural properties of MDPs that bound loss due to shallow planning
AU - Jiang, Nan
AU - Singh, Satinder
AU - Tewari, Ambuj
N1 - This work was supported by NSF grants IIS 1319365 (for Jiang and Singh) and IIS 1452099 (for Tewari). Any opinions, findings, conclusions, or recommendations expressed here are those of the authors and do not necessarily reflect the views of the sponsors
PY - 2016
Y1 - 2016
N2 - Planning in MDPs often uses a smaller planning horizon than specified in the problem to save computational expense at the risk of a loss due to suboptimal plans. Jiang et al. [2015b] recently showed that smaller than specified planning horizons can in fact be beneficial in cases where the MDP model is learned from data and therefore not accurate. In this paper, we consider planning with accurate models and investigate structural properties of MDPs that bound the loss incurred by using smaller than specified planning horizons. We identify a number of structural parameters some of which depend on the reward function alone, some on the transition dynamics alone, and some that depend on the interaction between rewards and transition dynamics. We provide planning loss bounds in terms of these structural parameters and, in some cases, also show tightness of the upper bounds. Empirical results with randomly generated MDPs are used to validate qualitative properties of our theoretical bounds for shallow planning.
AB - Planning in MDPs often uses a smaller planning horizon than specified in the problem to save computational expense at the risk of a loss due to suboptimal plans. Jiang et al. [2015b] recently showed that smaller than specified planning horizons can in fact be beneficial in cases where the MDP model is learned from data and therefore not accurate. In this paper, we consider planning with accurate models and investigate structural properties of MDPs that bound the loss incurred by using smaller than specified planning horizons. We identify a number of structural parameters some of which depend on the reward function alone, some on the transition dynamics alone, and some that depend on the interaction between rewards and transition dynamics. We provide planning loss bounds in terms of these structural parameters and, in some cases, also show tightness of the upper bounds. Empirical results with randomly generated MDPs are used to validate qualitative properties of our theoretical bounds for shallow planning.
UR - http://www.scopus.com/inward/record.url?scp=85006108607&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006108607&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85006108607
SN - 1045-0823
VL - 2016-January
SP - 1640
EP - 1647
JO - IJCAI International Joint Conference on Artificial Intelligence
JF - IJCAI International Joint Conference on Artificial Intelligence
T2 - 25th International Joint Conference on Artificial Intelligence, IJCAI 2016
Y2 - 9 July 2016 through 15 July 2016
ER -