TY - GEN
T1 - Provably efficient RL with rich observations via latent state decoding
AU - Du, Simon S.
AU - Krishnamurthy, Akshay
AU - Jiang, Nan
AU - Agarwal, Alekh
AU - Dudík, Miroslav
AU - Langford, John
N1 - Publisher Copyright:
Copyright 2019 by the author(s).
PY - 2019
Y1 - 2019
N2 - We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, wc demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. Wc provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over Q-learning with naïve exploration, even when Q-lcarning has cheating access to latent states.
AB - We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, wc demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. Wc provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over Q-learning with naïve exploration, even when Q-lcarning has cheating access to latent states.
UR - http://www.scopus.com/inward/record.url?scp=85079450217&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079450217&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85079450217
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 2971
EP - 3002
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -