Abstract
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but wellbehaved policies. SVPG is robust to random initializations and can easily be implemented in a parallel manner. On several continuous control problems, we find that SVPG versions of REINFORCE and advantage actor-critic algorithms are greatly improved in terms of both average return and data efficiency.
Original language | English (US) |
---|---|
State | Published - 2017 |
Event | 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017 - Sydney, Australia Duration: Aug 11 2017 → Aug 15 2017 |
Other
Other | 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017 |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 8/11/17 → 8/15/17 |
ASJC Scopus subject areas
- Artificial Intelligence