Q-learning for POMDP: An application to learning locomotion gaits

Tixian Wang, Amirhossein Taghvaei, Prashant G. Mehta

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper presents a Q-learning framework for learning optimal locomotion gaits in robotic systems modeled as coupled rigid bodies. Inspired by prevalence of periodic gaits in bio-locomotion, an open loop periodic input is assumed to (say) affect a nominal gait. The learning problem is to learn a new (modified) gait by using only partial noisy measurements of the state. The objective of learning is to maximize a given reward modeled as an objective function in optimal control settings. The proposed control architecture has three main components: (i) Phase modeling of dynamics by a single phase variable; (ii) A coupled oscillator feedback particle filter to represent the posterior distribution of the phase conditioned in the sensory measurements; and (iii) A Q-learning algorithm to learn the approximate optimal control law. The architecture is illustrated with the aid of a planar two-body system. The performance of the learning is demonstrated in a simulation environment.

Original languageEnglish (US)
Title of host publication2019 IEEE 58th Conference on Decision and Control, CDC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781728113982
StatePublished - Dec 2019
Event58th IEEE Conference on Decision and Control, CDC 2019 - Nice, France
Duration: Dec 11 2019Dec 13 2019

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370


Conference58th IEEE Conference on Decision and Control, CDC 2019

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization


Dive into the research topics of 'Q-learning for POMDP: An application to learning locomotion gaits'. Together they form a unique fingerprint.

Cite this