### Abstract

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

Original language | English (US) |
---|---|

Title of host publication | 2019 American Control Conference, ACC 2019 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 2366-2371 |

Number of pages | 6 |

ISBN (Electronic) | 9781538679265 |

State | Published - Jul 2019 |

Event | 2019 American Control Conference, ACC 2019 - Philadelphia, United States Duration: Jul 10 2019 → Jul 12 2019 |

### Publication series

Name | Proceedings of the American Control Conference |
---|---|

Volume | 2019-July |

ISSN (Print) | 0743-1619 |

### Conference

Conference | 2019 American Control Conference, ACC 2019 |
---|---|

Country | United States |

City | Philadelphia |

Period | 7/10/19 → 7/12/19 |

### Fingerprint

### ASJC Scopus subject areas

- Electrical and Electronic Engineering

### Cite this

*2019 American Control Conference, ACC 2019*(pp. 2366-2371). [8814849] (Proceedings of the American Control Conference; Vol. 2019-July). Institute of Electrical and Electronics Engineers Inc..

**Hidden markov model estimation-based q-learning for partially observable markov decision process.** / Yoon, Hyung Jin; Lee, Donghwan; Hovakimyan, Naira.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*2019 American Control Conference, ACC 2019.*, 8814849, Proceedings of the American Control Conference, vol. 2019-July, Institute of Electrical and Electronics Engineers Inc., pp. 2366-2371, 2019 American Control Conference, ACC 2019, Philadelphia, United States, 7/10/19.

}

TY - GEN

T1 - Hidden markov model estimation-based q-learning for partially observable markov decision process

AU - Yoon, Hyung Jin

AU - Lee, Donghwan

AU - Hovakimyan, Naira

PY - 2019/7

Y1 - 2019/7

N2 - The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

AB - The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q-function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q-function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q-function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

UR - http://www.scopus.com/inward/record.url?scp=85072294610&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072294610&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85072294610

T3 - Proceedings of the American Control Conference

SP - 2366

EP - 2371

BT - 2019 American Control Conference, ACC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -