TY - GEN
T1 - Hidden markov model estimation-based q-learning for partially observable markov decision process
AU - Yoon, Hyung Jin
AU - Lee, Donghwan
AU - Hovakimyan, Naira
N1 - Funding Information:
Research supported by NSF NRI initiative #1528036 and #1830639.
Publisher Copyright:
© 2019 American Automatic Control Council.
PY - 2019/7
Y1 - 2019/7
N2 - The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.
AB - The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.
UR - http://www.scopus.com/inward/record.url?scp=85072294610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072294610&partnerID=8YFLogxK
U2 - 10.23919/acc.2019.8814849
DO - 10.23919/acc.2019.8814849
M3 - Conference contribution
AN - SCOPUS:85072294610
T3 - Proceedings of the American Control Conference
SP - 2366
EP - 2371
BT - 2019 American Control Conference, ACC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 American Control Conference, ACC 2019
Y2 - 10 July 2019 through 12 July 2019
ER -