TY - GEN
T1 - Offline Reinforcement Learning Under Value and Density-Ratio Realizability
T2 - 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022
AU - Chen, Jinglin
AU - Jiang, Nan
N1 - Publisher Copyright:
© 2022 Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022. All right reserved.
PY - 2022
Y1 - 2022
N2 - We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators. While the existing theory has addressed learning under realizability and under non-exploratory data separately, no work has been able to address both simultaneously (except for a concurrent work which we compare in detail). Under an additional gap assumption, we provide guarantees to a simple pessimistic algorithm based on a version space formed by marginalized importance sampling (MIS), and the guarantee only requires the data to cover the optimal policy and the function classes to realize the optimal value and density-ratio functions. While similar gap assumptions have been used in other areas of RL theory, our work is the first to identify the utility and the novel mechanism of gap assumptions in offline RL with weak function approximation.
AB - We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators. While the existing theory has addressed learning under realizability and under non-exploratory data separately, no work has been able to address both simultaneously (except for a concurrent work which we compare in detail). Under an additional gap assumption, we provide guarantees to a simple pessimistic algorithm based on a version space formed by marginalized importance sampling (MIS), and the guarantee only requires the data to cover the optimal policy and the function classes to realize the optimal value and density-ratio functions. While similar gap assumptions have been used in other areas of RL theory, our work is the first to identify the utility and the novel mechanism of gap assumptions in offline RL with weak function approximation.
UR - http://www.scopus.com/inward/record.url?scp=85142383030&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142383030&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85142383030
T3 - Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022
SP - 378
EP - 388
BT - Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022
PB - Association For Uncertainty in Artificial Intelligence (AUAI)
Y2 - 1 August 2022 through 5 August 2022
ER -