Hidden markov model estimation-based q-learning for partially observable markov decision process

Hyung Jin Yoon, Donghwan Lee, Naira Hovakimyan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

Original languageEnglish (US)
Title of host publication2019 American Control Conference, ACC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2366-2371
Number of pages6
ISBN (Electronic)9781538679265
StatePublished - Jul 2019
Event2019 American Control Conference, ACC 2019 - Philadelphia, United States
Duration: Jul 10 2019Jul 12 2019

Publication series

NameProceedings of the American Control Conference
Volume2019-July
ISSN (Print)0743-1619

Conference

Conference2019 American Control Conference, ACC 2019
CountryUnited States
CityPhiladelphia
Period7/10/197/12/19

Fingerprint

Hidden Markov models
Learning algorithms
Maximum likelihood

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Yoon, H. J., Lee, D., & Hovakimyan, N. (2019). Hidden markov model estimation-based q-learning for partially observable markov decision process. In 2019 American Control Conference, ACC 2019 (pp. 2366-2371). [8814849] (Proceedings of the American Control Conference; Vol. 2019-July). Institute of Electrical and Electronics Engineers Inc..

Hidden markov model estimation-based q-learning for partially observable markov decision process. / Yoon, Hyung Jin; Lee, Donghwan; Hovakimyan, Naira.

2019 American Control Conference, ACC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 2366-2371 8814849 (Proceedings of the American Control Conference; Vol. 2019-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yoon, HJ, Lee, D & Hovakimyan, N 2019, Hidden markov model estimation-based q-learning for partially observable markov decision process. in 2019 American Control Conference, ACC 2019., 8814849, Proceedings of the American Control Conference, vol. 2019-July, Institute of Electrical and Electronics Engineers Inc., pp. 2366-2371, 2019 American Control Conference, ACC 2019, Philadelphia, United States, 7/10/19.
Yoon HJ, Lee D, Hovakimyan N. Hidden markov model estimation-based q-learning for partially observable markov decision process. In 2019 American Control Conference, ACC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 2366-2371. 8814849. (Proceedings of the American Control Conference).
Yoon, Hyung Jin ; Lee, Donghwan ; Hovakimyan, Naira. / Hidden markov model estimation-based q-learning for partially observable markov decision process. 2019 American Control Conference, ACC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 2366-2371 (Proceedings of the American Control Conference).
@inproceedings{fd4ded525da54b98b95e9eeb8c0128c0,
title = "Hidden markov model estimation-based q-learning for partially observable markov decision process",
abstract = "The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.",
author = "Yoon, {Hyung Jin} and Donghwan Lee and Naira Hovakimyan",
year = "2019",
month = "7",
language = "English (US)",
series = "Proceedings of the American Control Conference",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2366--2371",
booktitle = "2019 American Control Conference, ACC 2019",
address = "United States",

}

TY - GEN

T1 - Hidden markov model estimation-based q-learning for partially observable markov decision process

AU - Yoon, Hyung Jin

AU - Lee, Donghwan

AU - Hovakimyan, Naira

PY - 2019/7

Y1 - 2019/7

N2 - The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

AB - The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

UR - http://www.scopus.com/inward/record.url?scp=85072294610&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072294610&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85072294610

T3 - Proceedings of the American Control Conference

SP - 2366

EP - 2371

BT - 2019 American Control Conference, ACC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -