TY - GEN

T1 - A simulation-based method for aggregating Markov chains

AU - Deng, Kun

AU - Mehta, Prashant Girdharilal

AU - Meyn, Sean P.

PY - 2009/12/1

Y1 - 2009/12/1

N2 - This paper addresses model reduction for a Markov chain on a large state space. A simulation-based framework is introduced to perform state aggregation of the Markov chain based on observations of a single sample path. The Kullback-Leibler (K-L) divergence rate is employed as a metric to measure the distance between two stationary Markov chains. Model reduction with respect to this metric is cast as an infinite-horizon average cost optimal control problem. In this way an optimal policy corresponds to an optimal partition of the state space with respect to the K-L divergence rate. The optimal control problem is simplified in an approximate dynamic programming (ADP) framework: First, a relaxation of the policy space is performed, and based on this a parameterization of the set of optimal policies is introduced. This makes possible a stochastic approximation approach to compute the best policy within a given parameterized class. The algorithm can be implemented using a single sample path of the Markov chain. Convergence is established using the ODE method. Examples illustrate the theoretical results, and show remarkably low variance and fast convergence.

AB - This paper addresses model reduction for a Markov chain on a large state space. A simulation-based framework is introduced to perform state aggregation of the Markov chain based on observations of a single sample path. The Kullback-Leibler (K-L) divergence rate is employed as a metric to measure the distance between two stationary Markov chains. Model reduction with respect to this metric is cast as an infinite-horizon average cost optimal control problem. In this way an optimal policy corresponds to an optimal partition of the state space with respect to the K-L divergence rate. The optimal control problem is simplified in an approximate dynamic programming (ADP) framework: First, a relaxation of the policy space is performed, and based on this a parameterization of the set of optimal policies is introduced. This makes possible a stochastic approximation approach to compute the best policy within a given parameterized class. The algorithm can be implemented using a single sample path of the Markov chain. Convergence is established using the ODE method. Examples illustrate the theoretical results, and show remarkably low variance and fast convergence.

UR - http://www.scopus.com/inward/record.url?scp=77950842453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950842453&partnerID=8YFLogxK

U2 - 10.1109/CDC.2009.5400533

DO - 10.1109/CDC.2009.5400533

M3 - Conference contribution

AN - SCOPUS:77950842453

SN - 9781424438716

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 4710

EP - 4716

BT - Proceedings of the 48th IEEE Conference on Decision and Control held jointly with 2009 28th Chinese Control Conference, CDC/CCC 2009

T2 - 48th IEEE Conference on Decision and Control held jointly with 2009 28th Chinese Control Conference, CDC/CCC 2009

Y2 - 15 December 2009 through 18 December 2009

ER -