Extracting phenotypes from patient claim records using nonnegative tensor factorization

Joyce C. Ho, Joydeep Ghosh, Jimeng Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Electronic health records (EHRs) are becoming an increasingly important source of patient information. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. In this paper, we use Limestone, a nonnegative tensor factorization method to derive phenotype candidates from claims data with virtually no human supervision. Limestone represents the interactions between diagnoses and procedures among patients naturally using tensors (a generalization of matrices). The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and procedures. To the best of our knowledge, this is the first study that successfully extracts useful phenotypes by applying sparse nonnegative tensor factorization to a large, public-domain EHR dataset covering a broad range of diseases. Our experiments demonstrate the interpretability and the promise of high-throughput phenotypes generated from tensor factorization.

Original languageEnglish (US)
Title of host publicationBrain Informatics and Health - International Conference, BIH 2014, Proceedings
Number of pages10
ISBN (Print)9783319098906
StatePublished - Jan 1 2014
Externally publishedYes
Event2014 International Conference on Brain Informatics and Health, BIH 2014 - Warsaw, Poland
Duration: Aug 11 2014Aug 14 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8609 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference2014 International Conference on Brain Informatics and Health, BIH 2014


  • dimensionality reduction
  • EHR phenotyping
  • tensor factorization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Extracting phenotypes from patient claim records using nonnegative tensor factorization'. Together they form a unique fingerprint.

Cite this