Stein variational policy gradient

Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng

Research output: Contribution to conferencePaper

Abstract

Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but wellbehaved policies. SVPG is robust to random initializations and can easily be implemented in a parallel manner. On several continuous control problems, we find that SVPG versions of REINFORCE and advantage actor-critic algorithms are greatly improved in terms of both average return and data efficiency.

Original languageEnglish (US)
StatePublished - Jan 1 2017
Event33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017 - Sydney, Australia
Duration: Aug 11 2017Aug 15 2017

Other

Other33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017
CountryAustralia
CitySydney
Period8/11/178/15/17

Fingerprint

Gradient methods
Reinforcement learning
Entropy

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Liu, Y., Ramachandran, P., Liu, Q., & Peng, J. (2017). Stein variational policy gradient. Paper presented at 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia.

Stein variational policy gradient. / Liu, Yang; Ramachandran, Prajit; Liu, Qiang; Peng, Jian.

2017. Paper presented at 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia.

Research output: Contribution to conferencePaper

Liu, Y, Ramachandran, P, Liu, Q & Peng, J 2017, 'Stein variational policy gradient' Paper presented at 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, 8/11/17 - 8/15/17, .
Liu Y, Ramachandran P, Liu Q, Peng J. Stein variational policy gradient. 2017. Paper presented at 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia.
Liu, Yang ; Ramachandran, Prajit ; Liu, Qiang ; Peng, Jian. / Stein variational policy gradient. Paper presented at 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia.
@conference{7e0ee5d0861e4535a49cd11d774ac70a,
title = "Stein variational policy gradient",
abstract = "Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but wellbehaved policies. SVPG is robust to random initializations and can easily be implemented in a parallel manner. On several continuous control problems, we find that SVPG versions of REINFORCE and advantage actor-critic algorithms are greatly improved in terms of both average return and data efficiency.",
author = "Yang Liu and Prajit Ramachandran and Qiang Liu and Jian Peng",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
note = "33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017 ; Conference date: 11-08-2017 Through 15-08-2017",

}

TY - CONF

T1 - Stein variational policy gradient

AU - Liu, Yang

AU - Ramachandran, Prajit

AU - Liu, Qiang

AU - Peng, Jian

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but wellbehaved policies. SVPG is robust to random initializations and can easily be implemented in a parallel manner. On several continuous control problems, we find that SVPG versions of REINFORCE and advantage actor-critic algorithms are greatly improved in terms of both average return and data efficiency.

AB - Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but wellbehaved policies. SVPG is robust to random initializations and can easily be implemented in a parallel manner. On several continuous control problems, we find that SVPG versions of REINFORCE and advantage actor-critic algorithms are greatly improved in terms of both average return and data efficiency.

UR - http://www.scopus.com/inward/record.url?scp=85031126701&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85031126701&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85031126701

ER -