Q-learning for POMDP: An application to learning locomotion gaits

Tixian Wang, Amirhossein Taghvaei, Prashant G. Mehta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a Q-learning framework for learning optimal locomotion gaits in robotic systems modeled as coupled rigid bodies. Inspired by prevalence of periodic gaits in bio-locomotion, an open loop periodic input is assumed to (say) affect a nominal gait. The learning problem is to learn a new (modified) gait by using only partial noisy measurements of the state. The objective of learning is to maximize a given reward modeled as an objective function in optimal control settings. The proposed control architecture has three main components: (i) Phase modeling of dynamics by a single phase variable; (ii) A coupled oscillator feedback particle filter to represent the posterior distribution of the phase conditioned in the sensory measurements; and (iii) A Q-learning algorithm to learn the approximate optimal control law. The architecture is illustrated with the aid of a planar two-body system. The performance of the learning is demonstrated in a simulation environment.

Original languageEnglish (US)
Title of host publication2019 IEEE 58th Conference on Decision and Control, CDC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2758-2763
Number of pages6
ISBN (Electronic)9781728113982
DOIs
StatePublished - Dec 2019
Event58th IEEE Conference on Decision and Control, CDC 2019 - Nice, France
Duration: Dec 11 2019Dec 13 2019

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2019-December
ISSN (Print)0743-1546

Conference

Conference58th IEEE Conference on Decision and Control, CDC 2019
CountryFrance
CityNice
Period12/11/1912/13/19

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint Dive into the research topics of 'Q-learning for POMDP: An application to learning locomotion gaits'. Together they form a unique fingerprint.

  • Cite this

    Wang, T., Taghvaei, A., & Mehta, P. G. (2019). Q-learning for POMDP: An application to learning locomotion gaits. In 2019 IEEE 58th Conference on Decision and Control, CDC 2019 (pp. 2758-2763). [9030143] (Proceedings of the IEEE Conference on Decision and Control; Vol. 2019-December). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CDC40024.2019.9030143