Extended Abstract: Learning in Low-rank MDPs with Density Features

Audrey Huang, Jinglin Chen, Nan Jiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In online reinforcement learning (RL) with large state spaces, MDPs with low-rank transitions-that is, the transition matrix can be factored into the product of two matrices, left and right-is a highly representative structure that enables tractable exploration. When given to the learner, the left matrix enables expressive function approximation for value-based learning, and this setting has been studied extensively (e.g., in linear MDPs). Similarly, the right matrix induces powerful models for state-occupancy densities. However, using such density features to learn in low-rank MDPs has never been studied to the best of our knowledge, and is a setting with interesting connections to leveraging the power of generative models in RL. In this work, we initiate the study of learning low-rank MDPs with density features. Our algorithm performs reward-free learning and builds an exploratory distribution in a level-by-level manner. It uses the density features for off-policy estimation of the policies' state distributions, and constructs the exploratory data by choosing the barycentric spanner of these distributions. From an analytical point of view, the additive error of distribution estimation is largely incompatible with the multiplicative definition of data coverage (e.g., concentrability). In the absence of strong assumptions like reachability, this incompatibility may lead to exponential or even infinite errors under standard analysis strategies, which we overcome via novel technical tools.

Original languageEnglish (US)
Title of host publication2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665451819
DOIs
StatePublished - 2023
Event57th Annual Conference on Information Sciences and Systems, CISS 2023 - Baltimore, United States
Duration: Mar 22 2023Mar 24 2023

Publication series

Name2023 57th Annual Conference on Information Sciences and Systems, CISS 2023

Conference

Conference57th Annual Conference on Information Sciences and Systems, CISS 2023
Country/TerritoryUnited States
CityBaltimore
Period3/22/233/24/23

Keywords

  • density features
  • low-rank MDPs
  • reinforcement learning

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Artificial Intelligence
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Extended Abstract: Learning in Low-rank MDPs with Density Features'. Together they form a unique fingerprint.

Cite this