Abstract
It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that are observed over time. The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data. While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs. Further, it is important to detect how explanatory variables affect the clustering. An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities. We implement the generalized EM algorithm using several recursive relationships to greatly decrease the computational cost. The accuracy of our estimation method is illustrated in a simulation study, and U.S. Congressional data is analyzed.
Original language | English (US) |
---|---|
Pages (from-to) | 205-233 |
Number of pages | 29 |
Journal | Statistica Sinica |
Volume | 26 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2016 |
Keywords
- Cluster analysis
- EM algorithm
- Multinomial logistic regression
- Normal mixture models
- Time series
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty