Exponentially weighted imitation learning for batched historical data

Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu, Tong Zhang

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or “environment oracle” as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology.

Original languageEnglish (US)
Pages (from-to)6288-6297
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume31
StatePublished - Dec 1 2018
Externally publishedYes
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Exponentially weighted imitation learning for batched historical data'. Together they form a unique fingerprint.

Cite this