COPA: Constrained PARAFAC2 for sparse & large datasets

Ardavan Afshar, Elizabeth Searles, Ioakeim Perros, Joyce Ho, Evangelos E. Papalexakis, Jimeng Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a COnstrained PARAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36 faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.

Original languageEnglish (US)
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages793-802
Number of pages10
ISBN (Electronic)9781450360142
DOIs
StatePublished - Oct 17 2018
Externally publishedYes
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: Oct 22 2018Oct 26 2018

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
Country/TerritoryItaly
CityTorino
Period10/22/1810/26/18

Keywords

  • Computational Phenotyping
  • Tensor Factorization
  • Unsupervised Learning

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint

Dive into the research topics of 'COPA: Constrained PARAFAC2 for sparse & large datasets'. Together they form a unique fingerprint.

Cite this