Learning to play in a day: Faster deep reinforcement learning by optimality tightening

Frank S. He, Yang Liu, Alexander Gerhard Schwing, Jian Peng

Research output: Contribution to conferencePaper

Abstract

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.

Original languageEnglish (US)
StatePublished - Jan 1 2019
Event5th International Conference on Learning Representations, ICLR 2017 - Toulon, France
Duration: Apr 24 2017Apr 26 2017

Conference

Conference5th International Conference on Learning Representations, ICLR 2017
CountryFrance
CityToulon
Period4/24/174/26/17

Fingerprint

Reinforcement learning
reinforcement
Constrained optimization
learning
reward
learning environment
performance
Optimality
Reinforcement Learning
time

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this

He, F. S., Liu, Y., Schwing, A. G., & Peng, J. (2019). Learning to play in a day: Faster deep reinforcement learning by optimality tightening. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.

Learning to play in a day : Faster deep reinforcement learning by optimality tightening. / He, Frank S.; Liu, Yang; Schwing, Alexander Gerhard; Peng, Jian.

2019. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.

Research output: Contribution to conferencePaper

He, FS, Liu, Y, Schwing, AG & Peng, J 2019, 'Learning to play in a day: Faster deep reinforcement learning by optimality tightening' Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 4/24/17 - 4/26/17, .
He FS, Liu Y, Schwing AG, Peng J. Learning to play in a day: Faster deep reinforcement learning by optimality tightening. 2019. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.
He, Frank S. ; Liu, Yang ; Schwing, Alexander Gerhard ; Peng, Jian. / Learning to play in a day : Faster deep reinforcement learning by optimality tightening. Paper presented at 5th International Conference on Learning Representations, ICLR 2017, Toulon, France.
@conference{477d83fe0d7e423382d9db0dabf8ec86,
title = "Learning to play in a day: Faster deep reinforcement learning by optimality tightening",
abstract = "We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.",
author = "He, {Frank S.} and Yang Liu and Schwing, {Alexander Gerhard} and Jian Peng",
year = "2019",
month = "1",
day = "1",
language = "English (US)",
note = "5th International Conference on Learning Representations, ICLR 2017 ; Conference date: 24-04-2017 Through 26-04-2017",

}

TY - CONF

T1 - Learning to play in a day

T2 - Faster deep reinforcement learning by optimality tightening

AU - He, Frank S.

AU - Liu, Yang

AU - Schwing, Alexander Gerhard

AU - Peng, Jian

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.

AB - We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85070951616&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070951616&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85070951616

ER -