Policy shaping: Integrating human feedback with Reinforcement Learning

Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, Andrea Thomaz

Research output: Contribution to journalConference articlepeer-review

Abstract

A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
StatePublished - 2013
Externally publishedYes
Event27th Annual Conference on Neural Information Processing Systems, NIPS 2013 - Lake Tahoe, NV, United States
Duration: Dec 5 2013Dec 10 2013

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Policy shaping: Integrating human feedback with Reinforcement Learning'. Together they form a unique fingerprint.

Cite this