On oracle-efficient PAC RL with rich observations

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Research output: Contribution to journalConference article

Abstract

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation-accessing policy and value function classes exclusively through standard optimization primitives-and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE [1], cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

Original languageEnglish (US)
Pages (from-to)1422-1432
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - Jan 1 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

Fingerprint

Reinforcement learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Dann, C., Jiang, N., Krishnamurthy, A., Agarwal, A., Langford, J., & Schapire, R. E. (2018). On oracle-efficient PAC RL with rich observations. Advances in Neural Information Processing Systems, 2018-December, 1422-1432.

On oracle-efficient PAC RL with rich observations. / Dann, Christoph; Jiang, Nan; Krishnamurthy, Akshay; Agarwal, Alekh; Langford, John; Schapire, Robert E.

In: Advances in Neural Information Processing Systems, Vol. 2018-December, 01.01.2018, p. 1422-1432.

Research output: Contribution to journalConference article

Dann, C, Jiang, N, Krishnamurthy, A, Agarwal, A, Langford, J & Schapire, RE 2018, 'On oracle-efficient PAC RL with rich observations', Advances in Neural Information Processing Systems, vol. 2018-December, pp. 1422-1432.
Dann C, Jiang N, Krishnamurthy A, Agarwal A, Langford J, Schapire RE. On oracle-efficient PAC RL with rich observations. Advances in Neural Information Processing Systems. 2018 Jan 1;2018-December:1422-1432.
Dann, Christoph ; Jiang, Nan ; Krishnamurthy, Akshay ; Agarwal, Alekh ; Langford, John ; Schapire, Robert E. / On oracle-efficient PAC RL with rich observations. In: Advances in Neural Information Processing Systems. 2018 ; Vol. 2018-December. pp. 1422-1432.
@article{fce309f42b2f42a1aaa32fdfb2f4e514,
title = "On oracle-efficient PAC RL with rich observations",
abstract = "We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation-accessing policy and value function classes exclusively through standard optimization primitives-and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE [1], cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.",
author = "Christoph Dann and Nan Jiang and Akshay Krishnamurthy and Alekh Agarwal and John Langford and Schapire, {Robert E.}",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2018-December",
pages = "1422--1432",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - On oracle-efficient PAC RL with rich observations

AU - Dann, Christoph

AU - Jiang, Nan

AU - Krishnamurthy, Akshay

AU - Agarwal, Alekh

AU - Langford, John

AU - Schapire, Robert E.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation-accessing policy and value function classes exclusively through standard optimization primitives-and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE [1], cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

AB - We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation-accessing policy and value function classes exclusively through standard optimization primitives-and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE [1], cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

UR - http://www.scopus.com/inward/record.url?scp=85064829735&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064829735&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85064829735

VL - 2018-December

SP - 1422

EP - 1432

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -