Action-dependent control variates for policy optimization via Stein’s identity

Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu

Research output: Contribution to conferencePaper

Abstract

Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.

Original languageEnglish (US)
StatePublished - Jan 1 2018
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018

Conference

Conference6th International Conference on Learning Representations, ICLR 2018
CountryCanada
CityVancouver
Period4/30/185/3/18

Fingerprint

Gradient methods
Reinforcement learning
efficiency
policy approach
reinforcement
critic
learning

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Computer Science Applications
  • Linguistics and Language

Cite this

Liu, H., Feng, Y., Mao, Y., Zhou, D., Peng, J., & Liu, Q. (2018). Action-dependent control variates for policy optimization via Stein’s identity. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.

Action-dependent control variates for policy optimization via Stein’s identity. / Liu, Hao; Feng, Yihao; Mao, Yi; Zhou, Dengyong; Peng, Jian; Liu, Qiang.

2018. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.

Research output: Contribution to conferencePaper

Liu, H, Feng, Y, Mao, Y, Zhou, D, Peng, J & Liu, Q 2018, 'Action-dependent control variates for policy optimization via Stein’s identity' Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada, 4/30/18 - 5/3/18, .
Liu H, Feng Y, Mao Y, Zhou D, Peng J, Liu Q. Action-dependent control variates for policy optimization via Stein’s identity. 2018. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.
Liu, Hao ; Feng, Yihao ; Mao, Yi ; Zhou, Dengyong ; Peng, Jian ; Liu, Qiang. / Action-dependent control variates for policy optimization via Stein’s identity. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.
@conference{162ba41dab8a4d1e89d7d15207540fde,
title = "Action-dependent control variates for policy optimization via Stein’s identity",
abstract = "Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.",
author = "Hao Liu and Yihao Feng and Yi Mao and Dengyong Zhou and Jian Peng and Qiang Liu",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
note = "6th International Conference on Learning Representations, ICLR 2018 ; Conference date: 30-04-2018 Through 03-05-2018",

}

TY - CONF

T1 - Action-dependent control variates for policy optimization via Stein’s identity

AU - Liu, Hao

AU - Feng, Yihao

AU - Mao, Yi

AU - Zhou, Dengyong

AU - Peng, Jian

AU - Liu, Qiang

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.

AB - Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.

UR - http://www.scopus.com/inward/record.url?scp=85055696265&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055696265&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85055696265

ER -