TY - GEN

T1 - Batch Value-function Approximation with Only Realizability

AU - Xie, Tengyang

AU - Jiang, Nan

N1 - Publisher Copyright:
Copyright © 2021 by the author(s)

PY - 2021

Y1 - 2021

N2 - We make progress in a long-standing problem of batch reinforcement learning (RL): learning Q? from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class. In fact, all existing algorithms demand function-approximation assumptions stronger than realizability, and the mounting negative evidence has led to a conjecture that sample-efficient learning is impossible in this setting (Chen & Jiang, 2019). Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure that reduces the learning problem to pairwise comparison, and solves the latter with the help of a state-action-space partition constructed from the compared functions. We also discuss how BVFT can be applied to model selection among other extensions and open problems.

AB - We make progress in a long-standing problem of batch reinforcement learning (RL): learning Q? from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class. In fact, all existing algorithms demand function-approximation assumptions stronger than realizability, and the mounting negative evidence has led to a conjecture that sample-efficient learning is impossible in this setting (Chen & Jiang, 2019). Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure that reduces the learning problem to pairwise comparison, and solves the latter with the help of a state-action-space partition constructed from the compared functions. We also discuss how BVFT can be applied to model selection among other extensions and open problems.

UR - http://www.scopus.com/inward/record.url?scp=85161295457&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85161295457&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85161295457

T3 - Proceedings of Machine Learning Research

SP - 11404

EP - 11413

BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021

PB - ML Research Press

T2 - 38th International Conference on Machine Learning, ICML 2021

Y2 - 18 July 2021 through 24 July 2021

ER -