Minimax Model Learning

Cameron Voloshin, Nan Jiang, Yisong Yue

Research output: Contribution to journalConference articlepeer-review

Abstract

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

Original languageEnglish (US)
Pages (from-to)1612-1620
Number of pages9
JournalProceedings of Machine Learning Research
Volume130
StatePublished - 2021
Externally publishedYes
Event24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021 - Virtual, Online, United States
Duration: Apr 13 2021Apr 15 2021

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Minimax Model Learning'. Together they form a unique fingerprint.

Cite this