Model-based longitudinal clustering with varying cluster assignments

Research output: Contribution to journalArticle

Abstract

It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that are observed over time. The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data. While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs. Further, it is important to detect how explanatory variables affect the clustering. An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities. We implement the generalized EM algorithm using several recursive relationships to greatly decrease the computational cost. The accuracy of our estimation method is illustrated in a simulation study, and U.S. Congressional data is analyzed.

Original languageEnglish (US)
Pages (from-to)205-233
Number of pages29
JournalStatistica Sinica
Volume26
Issue number1
DOIs
StatePublished - Jan 2016

Fingerprint

Assignment
Clustering
Model-based
Longitudinal Data
Model-based Clustering
Normal Mixture
EM Algorithm
Transition Probability
Clustering Methods
Mixture Model
Modeling
Computational Cost
Intuitive
Simulation Study
Model
Decrease
Estimate
Object
Longitudinal data

Keywords

  • Cluster analysis
  • EM algorithm
  • Multinomial logistic regression
  • Normal mixture models
  • Time series

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Model-based longitudinal clustering with varying cluster assignments. / Sewell, Daniel K.; Chen, Yuguo; Bernhard, William T; Sulkin, Tracy E.

In: Statistica Sinica, Vol. 26, No. 1, 01.2016, p. 205-233.

Research output: Contribution to journalArticle

@article{41f3fc4e0d824fcdbe02df1b81a376c3,
title = "Model-based longitudinal clustering with varying cluster assignments",
abstract = "It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that are observed over time. The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data. While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs. Further, it is important to detect how explanatory variables affect the clustering. An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities. We implement the generalized EM algorithm using several recursive relationships to greatly decrease the computational cost. The accuracy of our estimation method is illustrated in a simulation study, and U.S. Congressional data is analyzed.",
keywords = "Cluster analysis, EM algorithm, Multinomial logistic regression, Normal mixture models, Time series",
author = "Sewell, {Daniel K.} and Yuguo Chen and Bernhard, {William T} and Sulkin, {Tracy E}",
year = "2016",
month = "1",
doi = "10.5705/ss.2014.205",
language = "English (US)",
volume = "26",
pages = "205--233",
journal = "Statistica Sinica",
issn = "1017-0405",
publisher = "Institute of Statistical Science",
number = "1",

}

TY - JOUR

T1 - Model-based longitudinal clustering with varying cluster assignments

AU - Sewell, Daniel K.

AU - Chen, Yuguo

AU - Bernhard, William T

AU - Sulkin, Tracy E

PY - 2016/1

Y1 - 2016/1

N2 - It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that are observed over time. The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data. While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs. Further, it is important to detect how explanatory variables affect the clustering. An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities. We implement the generalized EM algorithm using several recursive relationships to greatly decrease the computational cost. The accuracy of our estimation method is illustrated in a simulation study, and U.S. Congressional data is analyzed.

AB - It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that are observed over time. The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data. While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs. Further, it is important to detect how explanatory variables affect the clustering. An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities. We implement the generalized EM algorithm using several recursive relationships to greatly decrease the computational cost. The accuracy of our estimation method is illustrated in a simulation study, and U.S. Congressional data is analyzed.

KW - Cluster analysis

KW - EM algorithm

KW - Multinomial logistic regression

KW - Normal mixture models

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=85011356011&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011356011&partnerID=8YFLogxK

U2 - 10.5705/ss.2014.205

DO - 10.5705/ss.2014.205

M3 - Article

AN - SCOPUS:85011356011

VL - 26

SP - 205

EP - 233

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 1

ER -