TY - JOUR
T1 - On the choice of number of superstates in the aggregation of Markov chains
AU - Srivastava, Amber
AU - Velicheti, Raj K.
AU - Salapaka, Srinivasa M.
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7
Y1 - 2022/7
N2 - Many studies involving large Markov chains require determining a smaller representative (aggregated) chain. Each superstate in the representative chain represents a group of related states in the original Markov chain. Typically, the choice of number of superstates in the aggregated chain is ambiguous, and based on the limited prior know-how. This paper presents a structured methodology of determining the best candidate for the number of superstates. It achieves this by comparing aggregated chains of different sizes. To facilitate this comparison a new quantity called heterogeneity of a superstate is developed, and subsequently it is used to establish the notion of marginal return of an aggregated chain. In particular, the notion of marginal return captures the decrease in the heterogeneity upon a unit increase in the number of superstates in the aggregated chain. Maximum Entropy Principle (MEP), from statistical mechanics, justifies the developed notion of marginal return, as well as the quantification of heterogeneity. Through simulations on synthetic Markov chains, where the number of superstates are known a priori, it is observed that the aggregated chain with the largest marginal return identifies this number. In case of Markov chains that model real-life scenarios it is shown that the aggregated model with the largest marginal return identifies an inherent structure unique to the scenario being modelled; thus, substantiating the efficacy of the proposed methodology.
AB - Many studies involving large Markov chains require determining a smaller representative (aggregated) chain. Each superstate in the representative chain represents a group of related states in the original Markov chain. Typically, the choice of number of superstates in the aggregated chain is ambiguous, and based on the limited prior know-how. This paper presents a structured methodology of determining the best candidate for the number of superstates. It achieves this by comparing aggregated chains of different sizes. To facilitate this comparison a new quantity called heterogeneity of a superstate is developed, and subsequently it is used to establish the notion of marginal return of an aggregated chain. In particular, the notion of marginal return captures the decrease in the heterogeneity upon a unit increase in the number of superstates in the aggregated chain. Maximum Entropy Principle (MEP), from statistical mechanics, justifies the developed notion of marginal return, as well as the quantification of heterogeneity. Through simulations on synthetic Markov chains, where the number of superstates are known a priori, it is observed that the aggregated chain with the largest marginal return identifies this number. In case of Markov chains that model real-life scenarios it is shown that the aggregated model with the largest marginal return identifies an inherent structure unique to the scenario being modelled; thus, substantiating the efficacy of the proposed methodology.
UR - http://www.scopus.com/inward/record.url?scp=85133927974&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133927974&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2022.05.019
DO - 10.1016/j.patrec.2022.05.019
M3 - Article
AN - SCOPUS:85133927974
SN - 0167-8655
VL - 159
SP - 181
EP - 188
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -