TY - GEN
T1 - Extended Abstract
T2 - 57th Annual Conference on Information Sciences and Systems, CISS 2023
AU - Huang, Audrey
AU - Chen, Jinglin
AU - Jiang, Nan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In online reinforcement learning (RL) with large state spaces, MDPs with low-rank transitions-that is, the transition matrix can be factored into the product of two matrices, left and right-is a highly representative structure that enables tractable exploration. When given to the learner, the left matrix enables expressive function approximation for value-based learning, and this setting has been studied extensively (e.g., in linear MDPs). Similarly, the right matrix induces powerful models for state-occupancy densities. However, using such density features to learn in low-rank MDPs has never been studied to the best of our knowledge, and is a setting with interesting connections to leveraging the power of generative models in RL. In this work, we initiate the study of learning low-rank MDPs with density features. Our algorithm performs reward-free learning and builds an exploratory distribution in a level-by-level manner. It uses the density features for off-policy estimation of the policies' state distributions, and constructs the exploratory data by choosing the barycentric spanner of these distributions. From an analytical point of view, the additive error of distribution estimation is largely incompatible with the multiplicative definition of data coverage (e.g., concentrability). In the absence of strong assumptions like reachability, this incompatibility may lead to exponential or even infinite errors under standard analysis strategies, which we overcome via novel technical tools.
AB - In online reinforcement learning (RL) with large state spaces, MDPs with low-rank transitions-that is, the transition matrix can be factored into the product of two matrices, left and right-is a highly representative structure that enables tractable exploration. When given to the learner, the left matrix enables expressive function approximation for value-based learning, and this setting has been studied extensively (e.g., in linear MDPs). Similarly, the right matrix induces powerful models for state-occupancy densities. However, using such density features to learn in low-rank MDPs has never been studied to the best of our knowledge, and is a setting with interesting connections to leveraging the power of generative models in RL. In this work, we initiate the study of learning low-rank MDPs with density features. Our algorithm performs reward-free learning and builds an exploratory distribution in a level-by-level manner. It uses the density features for off-policy estimation of the policies' state distributions, and constructs the exploratory data by choosing the barycentric spanner of these distributions. From an analytical point of view, the additive error of distribution estimation is largely incompatible with the multiplicative definition of data coverage (e.g., concentrability). In the absence of strong assumptions like reachability, this incompatibility may lead to exponential or even infinite errors under standard analysis strategies, which we overcome via novel technical tools.
KW - density features
KW - low-rank MDPs
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85154019178&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85154019178&partnerID=8YFLogxK
U2 - 10.1109/CISS56502.2023.10089731
DO - 10.1109/CISS56502.2023.10089731
M3 - Conference contribution
AN - SCOPUS:85154019178
T3 - 2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
BT - 2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 March 2023 through 24 March 2023
ER -