Disentangling Controllable Object Through Video Prediction Improves Visual Reinforcement Learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In many vision-based reinforcement learning (RL) problems, the agent controls a movable object in its visual field, e.g., the player's avatar in video games and the robotic arm in visual grasping and manipulation. Leveraging action-conditioned video prediction, we propose an end-to-end learning frame-work to disentangle the controllable object from the observation signal. The disentangled representation is shown to be useful for RL as additional observation channels to the agent. Experiments on a set of Atari games with the popular Double DQN algorithm demonstrate improved sample efficiency and game performance (from 222.8% to 261.4% measured in normalized game scores, with prediction bonus reward).

Original languageEnglish (US)
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3672-3676
Number of pages5
ISBN (Electronic)9781509066315
DOIs
StatePublished - May 2020
Event2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
Duration: May 4 2020May 8 2020

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2020-May
ISSN (Print)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
CountrySpain
CityBarcelona
Period5/4/205/8/20

Keywords

  • reinforcement learning
  • representation learning
  • sample efficiency
  • video prediction

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Disentangling Controllable Object Through Video Prediction Improves Visual Reinforcement Learning'. Together they form a unique fingerprint.

Cite this