TY - JOUR
T1 - Reinforcement Learning under Latent Dynamics
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
AU - Amortila, Philip
AU - Foster, Dylan J.
AU - Jiang, Nan
AU - Krishnamurthy, Akshay
AU - Mhammedi, Zakaria
N1 - Nan Jiang acknowledges funding support from NSF IIS-2112471, NSF CAREER IIS-2141781, Google Scholar Award, and Sloan Fellowship.
PY - 2024
Y1 - 2024
N2 - Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (“latent”) dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are poorly understood. This paper addresses the question of reinforcement learning under general latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions-that is, reductions that transform an arbitrary algorithm for the latent MDP into an algorithm that can operate on rich observations-in two settings: one where the agent has access to hindsight observations of the latent dynamics [LADZ23], and one where the agent can estimate self-predictive latent models [SAGHCB20]. Together, our results serve as a first step toward a unified statistical and algorithmic theory for reinforcement learning under latent dynamics.
AB - Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (“latent”) dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are poorly understood. This paper addresses the question of reinforcement learning under general latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions-that is, reductions that transform an arbitrary algorithm for the latent MDP into an algorithm that can operate on rich observations-in two settings: one where the agent has access to hindsight observations of the latent dynamics [LADZ23], and one where the agent can estimate self-predictive latent models [SAGHCB20]. Together, our results serve as a first step toward a unified statistical and algorithmic theory for reinforcement learning under latent dynamics.
UR - http://www.scopus.com/inward/record.url?scp=105000538258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105000538258&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:105000538258
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 9 December 2024 through 15 December 2024
ER -