Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

Research output: Contribution to journalConference articlepeer-review

Abstract

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We also distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
StatePublished - 2021
Externally publishedYes
Event35th Conference on Neural Information Processing Systems - Track on Datasets and Benchmarks, NeurIPS Datasets and Benchmarks 2021 - Virtual, Online
Duration: Dec 6 2021Dec 14 2021

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning'. Together they form a unique fingerprint.

Cite this