On oracle-efficient PAC RL with rich observations

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation-accessing policy and value function classes exclusively through standard optimization primitives-and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE [1], cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

Original languageEnglish (US)
Pages (from-to)1422-1432
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'On oracle-efficient PAC RL with rich observations'. Together they form a unique fingerprint.

Cite this