TY - GEN
T1 - SPARTan
T2 - 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
AU - Perros, Ioakeim
AU - Papalexakis, Evangelos E.
AU - Wang, Fei
AU - Vuduc, Richard
AU - Searles, Elizabeth
AU - Thompson, Michael
AU - Sun, Jimeng
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/8/13
Y1 - 2017/8/13
N2 - In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To handle such data, the state-of-the-art tensor model is the so-called PARAFAC2, which yields interpretable and robust output and can naturally handle sparse data. However, its main limitation up to now has been the lack of eficient algorithms that can handle large-scale datasets. In this work, we fill this gap by developing a scalable method to compute the PARAFAC2 decomposition of large and sparse datasets, called SPARTan. Our method exploits special structure within PARAFAC2, leading to a novel algorithmic reformulation that is both faster (in absolute time) and more memory-eficient than prior work. We evaluate SPARTan on both synthetic and real datasets, showing 22 × performance gains over the best previous implementation and also handling larger problem instances for which the baseline fails. Furthermore, we are able to apply SPARTan to the mining of temporally-evolving phenotypes on data taken from real and medically complex pediatric patients. The clinical meaningfulness of the phenotypes identifed in this process, as well as their temporal evolution over time for several patients, have been endorsed by clinical experts.
AB - In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To handle such data, the state-of-the-art tensor model is the so-called PARAFAC2, which yields interpretable and robust output and can naturally handle sparse data. However, its main limitation up to now has been the lack of eficient algorithms that can handle large-scale datasets. In this work, we fill this gap by developing a scalable method to compute the PARAFAC2 decomposition of large and sparse datasets, called SPARTan. Our method exploits special structure within PARAFAC2, leading to a novel algorithmic reformulation that is both faster (in absolute time) and more memory-eficient than prior work. We evaluate SPARTan on both synthetic and real datasets, showing 22 × performance gains over the best previous implementation and also handling larger problem instances for which the baseline fails. Furthermore, we are able to apply SPARTan to the mining of temporally-evolving phenotypes on data taken from real and medically complex pediatric patients. The clinical meaningfulness of the phenotypes identifed in this process, as well as their temporal evolution over time for several patients, have been endorsed by clinical experts.
KW - PARAFAC2
KW - Phenotyping
KW - Sparse tensor factorization
KW - Unsuper-vised learning
UR - http://www.scopus.com/inward/record.url?scp=85029022472&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029022472&partnerID=8YFLogxK
U2 - 10.1145/3097983.3098014
DO - 10.1145/3097983.3098014
M3 - Conference contribution
AN - SCOPUS:85029022472
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 376
EP - 384
BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 13 August 2017 through 17 August 2017
ER -