Abstract
We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation-accessing policy and value function classes exclusively through standard optimization primitives-and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE [1], cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.
Original language | English (US) |
---|---|
Pages (from-to) | 1422-1432 |
Number of pages | 11 |
Journal | Advances in Neural Information Processing Systems |
Volume | 2018-December |
State | Published - 2018 |
Event | 32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada Duration: Dec 2 2018 → Dec 8 2018 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing