TY - GEN
T1 - PAC reinforcement learning with an imperfect model
AU - Jiang, Nan
N1 - Publisher Copyright:
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2018
Y1 - 2018
N2 - Reinforcement learning (RL) methods have proved to be successful in many simulated environments. The common approaches, however, are often too sample intensive to be applied directly in the real world. A promising approach to addressing this issue is to train an RL agent in a simulator and transfer the solution to the real environment. When a high-fidelity simulator is available we would expect significant reduction in the amount of real trajectories needed for learning. In this work we aim at better understanding the theoretical nature of this approach. We start with a perhaps surprising result that, even if the approximate model (e.g., a simulator) only differs from the real environment in a single state-action pair (but which one is unknown), such a model could be information-theoretically useless and the sample complexity (in terms of real trajectories) still scales with the total number of states in the worst case. We investigate the hard instances and come up with natural conditions that avoid the pathological situations. We then propose two conceptually simple algorithms that enjoy polynomial sample complexity guarantees with no dependence on the size of the state-action space, and prove some foundational results to provide insights into this important problem.
AB - Reinforcement learning (RL) methods have proved to be successful in many simulated environments. The common approaches, however, are often too sample intensive to be applied directly in the real world. A promising approach to addressing this issue is to train an RL agent in a simulator and transfer the solution to the real environment. When a high-fidelity simulator is available we would expect significant reduction in the amount of real trajectories needed for learning. In this work we aim at better understanding the theoretical nature of this approach. We start with a perhaps surprising result that, even if the approximate model (e.g., a simulator) only differs from the real environment in a single state-action pair (but which one is unknown), such a model could be information-theoretically useless and the sample complexity (in terms of real trajectories) still scales with the total number of states in the worst case. We investigate the hard instances and come up with natural conditions that avoid the pathological situations. We then propose two conceptually simple algorithms that enjoy polynomial sample complexity guarantees with no dependence on the size of the state-action space, and prove some foundational results to provide insights into this important problem.
UR - http://www.scopus.com/inward/record.url?scp=85060437358&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060437358&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85060437358
T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
SP - 3334
EP - 3341
BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PB - American Association for Artificial Intelligence (AAAI) Press
T2 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Y2 - 2 February 2018 through 7 February 2018
ER -