Trajectory-based probabilistic policy gradient for learning locomotion behaviors

Sungjoon Choi, Joohyung Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We define the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efficient skill learning. We first evaluate the sample efficiency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity.

Original languageEnglish (US)
Title of host publication2019 International Conference on Robotics and Automation, ICRA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-7
Number of pages7
ISBN (Electronic)9781538660263
DOIs
StatePublished - May 2019
Externally publishedYes
Event2019 International Conference on Robotics and Automation, ICRA 2019 - Montreal, Canada
Duration: May 20 2019May 24 2019

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
Volume2019-May
ISSN (Print)1050-4729

Conference

Conference2019 International Conference on Robotics and Automation, ICRA 2019
Country/TerritoryCanada
CityMontreal
Period5/20/195/24/19

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Trajectory-based probabilistic policy gradient for learning locomotion behaviors'. Together they form a unique fingerprint.

Cite this