TY - GEN
T1 - TASTE
AU - Afshar, Ardavan
AU - Perros, Ioakeim
AU - Park, Haesun
AU - Defilippi, Christopher
AU - Yan, Xiaowei
AU - Stewart, Walter
AU - Ho, Joyce
AU - Sun, Jimeng
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/2/4
Y1 - 2020/2/4
N2 - Phenotyping electronic health records (EHR)focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes.TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.
AB - Phenotyping electronic health records (EHR)focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes.TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.
KW - Computational Phenotyping
KW - Predictive modeling
KW - Tensor Factorization
UR - http://www.scopus.com/inward/record.url?scp=85082770470&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082770470&partnerID=8YFLogxK
U2 - 10.1145/3368555.3384464
DO - 10.1145/3368555.3384464
M3 - Conference contribution
C2 - 33659966
AN - SCOPUS:85082770470
T3 - ACM CHIL 2020 - Proceedings of the 2020 ACM Conference on Health, Inference, and Learning
SP - 193
EP - 203
BT - ACM CHIL 2020 - Proceedings of the 2020 ACM Conference on Health, Inference, and Learning
PB - Association for Computing Machinery
T2 - 2020 ACM Conference on Health, Inference, and Learning, CHIL 2020
Y2 - 2 April 2020 through 4 April 2020
ER -