TY - GEN
T1 - Extracting phenotypes from patient claim records using nonnegative tensor factorization
AU - Ho, Joyce C.
AU - Ghosh, Joydeep
AU - Sun, Jimeng
PY - 2014
Y1 - 2014
N2 - Electronic health records (EHRs) are becoming an increasingly important source of patient information. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. In this paper, we use Limestone, a nonnegative tensor factorization method to derive phenotype candidates from claims data with virtually no human supervision. Limestone represents the interactions between diagnoses and procedures among patients naturally using tensors (a generalization of matrices). The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and procedures. To the best of our knowledge, this is the first study that successfully extracts useful phenotypes by applying sparse nonnegative tensor factorization to a large, public-domain EHR dataset covering a broad range of diseases. Our experiments demonstrate the interpretability and the promise of high-throughput phenotypes generated from tensor factorization.
AB - Electronic health records (EHRs) are becoming an increasingly important source of patient information. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. In this paper, we use Limestone, a nonnegative tensor factorization method to derive phenotype candidates from claims data with virtually no human supervision. Limestone represents the interactions between diagnoses and procedures among patients naturally using tensors (a generalization of matrices). The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and procedures. To the best of our knowledge, this is the first study that successfully extracts useful phenotypes by applying sparse nonnegative tensor factorization to a large, public-domain EHR dataset covering a broad range of diseases. Our experiments demonstrate the interpretability and the promise of high-throughput phenotypes generated from tensor factorization.
KW - dimensionality reduction
KW - EHR phenotyping
KW - tensor factorization
UR - http://www.scopus.com/inward/record.url?scp=84905230181&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905230181&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-09891-3_14
DO - 10.1007/978-3-319-09891-3_14
M3 - Conference contribution
AN - SCOPUS:84905230181
SN - 9783319098906
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 142
EP - 151
BT - Brain Informatics and Health - International Conference, BIH 2014, Proceedings
PB - Springer
T2 - 2014 International Conference on Brain Informatics and Health, BIH 2014
Y2 - 11 August 2014 through 14 August 2014
ER -