TY - GEN
T1 - Q-learning for POMDP
T2 - 58th IEEE Conference on Decision and Control, CDC 2019
AU - Wang, Tixian
AU - Taghvaei, Amirhossein
AU - Mehta, Prashant G.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - This paper presents a Q-learning framework for learning optimal locomotion gaits in robotic systems modeled as coupled rigid bodies. Inspired by prevalence of periodic gaits in bio-locomotion, an open loop periodic input is assumed to (say) affect a nominal gait. The learning problem is to learn a new (modified) gait by using only partial noisy measurements of the state. The objective of learning is to maximize a given reward modeled as an objective function in optimal control settings. The proposed control architecture has three main components: (i) Phase modeling of dynamics by a single phase variable; (ii) A coupled oscillator feedback particle filter to represent the posterior distribution of the phase conditioned in the sensory measurements; and (iii) A Q-learning algorithm to learn the approximate optimal control law. The architecture is illustrated with the aid of a planar two-body system. The performance of the learning is demonstrated in a simulation environment.
AB - This paper presents a Q-learning framework for learning optimal locomotion gaits in robotic systems modeled as coupled rigid bodies. Inspired by prevalence of periodic gaits in bio-locomotion, an open loop periodic input is assumed to (say) affect a nominal gait. The learning problem is to learn a new (modified) gait by using only partial noisy measurements of the state. The objective of learning is to maximize a given reward modeled as an objective function in optimal control settings. The proposed control architecture has three main components: (i) Phase modeling of dynamics by a single phase variable; (ii) A coupled oscillator feedback particle filter to represent the posterior distribution of the phase conditioned in the sensory measurements; and (iii) A Q-learning algorithm to learn the approximate optimal control law. The architecture is illustrated with the aid of a planar two-body system. The performance of the learning is demonstrated in a simulation environment.
UR - http://www.scopus.com/inward/record.url?scp=85082499659&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082499659&partnerID=8YFLogxK
U2 - 10.1109/CDC40024.2019.9030143
DO - 10.1109/CDC40024.2019.9030143
M3 - Conference contribution
AN - SCOPUS:85082499659
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 2758
EP - 2763
BT - 2019 IEEE 58th Conference on Decision and Control, CDC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 December 2019 through 13 December 2019
ER -