TY - GEN
T1 - Sparse hierarchical tucker factorization and its application to healthcare
AU - Perros, Ioakeim
AU - Chen, Robert
AU - Vuduc, Richard
AU - Sun, Jimeng
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/5
Y1 - 2016/1/5
N2 - We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested sampling technique to overcome a key scalability problem in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense core tensor, the result of our approach is a faster, more space-efficient, and more accurate method. We test our method on a real healthcare dataset, which is collected from 30K patients and results in an 18th order sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the full data set on a single multi-threaded machine. It can also do so more accurately and in less time than the state-of-the-art: on a 12th order subset of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a previously state-of-the-art method. Moreover, we observe that Sparse H-Tucker scales nearly linearly in the number of non-zero tensor elements. The resulting model also provides an interpretable disease hierarchy, which is confirmed by a clinical expert.
AB - We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a domain expert. However, Sparse H-Tucker uses a nested sampling technique to overcome a key scalability problem in Hierarchical Tucker, which is the creation of an unwieldy intermediate dense core tensor, the result of our approach is a faster, more space-efficient, and more accurate method. We test our method on a real healthcare dataset, which is collected from 30K patients and results in an 18th order sparse data tensor. Unlike competing methods, Sparse H-Tucker can analyze the full data set on a single multi-threaded machine. It can also do so more accurately and in less time than the state-of-the-art: on a 12th order subset of the input data, Sparse H-Tucker is 18x more accurate and 7.5x faster than a previously state-of-the-art method. Moreover, we observe that Sparse H-Tucker scales nearly linearly in the number of non-zero tensor elements. The resulting model also provides an interpretable disease hierarchy, which is confirmed by a clinical expert.
UR - http://www.scopus.com/inward/record.url?scp=84963556335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963556335&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2015.29
DO - 10.1109/ICDM.2015.29
M3 - Conference contribution
AN - SCOPUS:84963556335
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 943
EP - 948
BT - Proceedings - 15th IEEE International Conference on Data Mining, ICDM 2015
A2 - Aggarwal, Charu
A2 - Zhou, Zhi-Hua
A2 - Tuzhilin, Alexander
A2 - Xiong, Hui
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Data Mining, ICDM 2015
Y2 - 14 November 2015 through 17 November 2015
ER -