TY - GEN
T1 - CP tensor decomposition with cannot-link intermode constraints
AU - Henderson, Jette
AU - Malin, Bradley A.
AU - Denny, Joshua C.
AU - Kho, Abel N.
AU - Sun, Jimeng
AU - Ghosh, Joydeep
AU - Ho, Joyce C.
N1 - Funding Information:
∗Supported by NSF grants 1417697, 1418511, and 1418504 and NIH grant 1K01LM012924-01. †CognitiveScale, jette.henderson@gmail.com ‡Vanderbilt University §Northwestern University ¶Georgia Technical Institute ‖University of Texas, Austin ∗∗Emory University
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Tensor factorization is a methodology that is applied in a variety of fields, ranging from climate modeling to medical informatics. A tensor is an n-way array that captures the relationship between n objects. These multiway arrays can be factored to study the underlying bases present in the data. Two challenges arising in tensor factorization are 1) the resulting factors can be noisy and highly overlapping with one another and 2) they may not map to insights within a domain. However, incorporating supervision to increase the number of insightful factors can be costly in terms of the time and domain expertise necessary for gathering labels or domain-specific constraints. To meet these challenges, we introduce CANDECOMP/PARAFAC (CP) tensor factorization with Cannot-Link Intermode Constraints (CP-CLIC), a framework that achieves succinct, diverse, interpretable factors. This is accomplished by gradually learning constraints that are verified with auxiliary information during the decomposition process. We demonstrate CP-CLIC’s potential to extract sparse, diverse, and interpretable factors through experiments on simulated data and a real-world application in medical informatics.
AB - Tensor factorization is a methodology that is applied in a variety of fields, ranging from climate modeling to medical informatics. A tensor is an n-way array that captures the relationship between n objects. These multiway arrays can be factored to study the underlying bases present in the data. Two challenges arising in tensor factorization are 1) the resulting factors can be noisy and highly overlapping with one another and 2) they may not map to insights within a domain. However, incorporating supervision to increase the number of insightful factors can be costly in terms of the time and domain expertise necessary for gathering labels or domain-specific constraints. To meet these challenges, we introduce CANDECOMP/PARAFAC (CP) tensor factorization with Cannot-Link Intermode Constraints (CP-CLIC), a framework that achieves succinct, diverse, interpretable factors. This is accomplished by gradually learning constraints that are verified with auxiliary information during the decomposition process. We demonstrate CP-CLIC’s potential to extract sparse, diverse, and interpretable factors through experiments on simulated data and a real-world application in medical informatics.
UR - http://www.scopus.com/inward/record.url?scp=85066066572&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066066572&partnerID=8YFLogxK
U2 - 10.1137/1.9781611975673.80
DO - 10.1137/1.9781611975673.80
M3 - Conference contribution
C2 - 31198618
AN - SCOPUS:85066066572
T3 - SIAM International Conference on Data Mining, SDM 2019
SP - 711
EP - 719
BT - SIAM International Conference on Data Mining, SDM 2019
PB - Society for Industrial and Applied Mathematics Publications
T2 - 19th SIAM International Conference on Data Mining, SDM 2019
Y2 - 2 May 2019 through 4 May 2019
ER -