Provably efficient RL with rich observations via latent state decoding

Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, wc demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. Wc provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over Q-learning with naïve exploration, even when Q-lcarning has cheating access to latent states.

Original languageEnglish (US)
Title of host publication36th International Conference on Machine Learning, ICML 2019
PublisherInternational Machine Learning Society (IMLS)
Pages2971-3002
Number of pages32
ISBN (Electronic)9781510886988
StatePublished - 2019
Event36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States
Duration: Jun 9 2019Jun 15 2019

Publication series

Name36th International Conference on Machine Learning, ICML 2019
Volume2019-June

Conference

Conference36th International Conference on Machine Learning, ICML 2019
Country/TerritoryUnited States
CityLong Beach
Period6/9/196/15/19

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Provably efficient RL with rich observations via latent state decoding'. Together they form a unique fingerprint.

Cite this