TY - GEN
T1 - Meta-Graph Based HIN Spectral Embedding
T2 - 18th IEEE International Conference on Data Mining, ICDM 2018
AU - Yang, Carl
AU - Feng, Yichen
AU - Li, Pan
AU - Shi, Yu
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - Heterogeneous information network (HIN) has drawn significant research attention recently, due to its power of modeling multi-typed multi-relational data and facilitating various downstream applications. In this decade, many algorithms have been developed for HIN modeling, including traditional similarity measures and recent embedding techniques. Most algorithms on HIN leverage meta-graphs or meta-paths (special cases of meta-graphs) to capture various semantics. Given any arbitrary set of meta-graphs, existing algorithms either consider them as equally important or study their different importance through supervised learning. Their performance largely relies on prior knowledge and labeled data. While unsupervised embedding has shown to be a fundamental solution for various homogeneous network mining tasks, for HIN, it is a much harder problem due to such a presence of various meta-graphs. In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner. Motivated by prolific research on homogeneous networks, especially spectral graph theory, we firstly conduct a systematic empirical study on the spectrum and embedding quality of different meta-graphs on multiple HINs, which leads to an efficient method of meta-graph assessment. It also helps us to gain valuable insight into the higher-order organization of HINs and indicates a practical way of selecting useful embedding dimensions. Further, we explore the challenges of combining multiple meta-graphs to capture the multi-dimensional semantics in HIN through reasoning from mathematical geometry and arrive at an embedding compression method of autoencoder with l2,1-loss, which finds the most informative meta-graphs and embeddings in an end-to-end unsupervised manner. Finally, empirical analysis suggests a unified workflow to close the gap between our meta-graph assessment and combination methods. To the best of our knowledge, this is the first research effort to provide rich theoretical and empirical analyses on the utility of meta-graphs and their combinations, especially regarding HIN embedding. Extensive experimental comparisons with various state-of-the-art neural network based embedding methods on multiple real-world HINs demonstrate the effectiveness and efficiency of our framework in finding useful meta-graphs and generating high-quality HIN embeddings.
AB - Heterogeneous information network (HIN) has drawn significant research attention recently, due to its power of modeling multi-typed multi-relational data and facilitating various downstream applications. In this decade, many algorithms have been developed for HIN modeling, including traditional similarity measures and recent embedding techniques. Most algorithms on HIN leverage meta-graphs or meta-paths (special cases of meta-graphs) to capture various semantics. Given any arbitrary set of meta-graphs, existing algorithms either consider them as equally important or study their different importance through supervised learning. Their performance largely relies on prior knowledge and labeled data. While unsupervised embedding has shown to be a fundamental solution for various homogeneous network mining tasks, for HIN, it is a much harder problem due to such a presence of various meta-graphs. In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner. Motivated by prolific research on homogeneous networks, especially spectral graph theory, we firstly conduct a systematic empirical study on the spectrum and embedding quality of different meta-graphs on multiple HINs, which leads to an efficient method of meta-graph assessment. It also helps us to gain valuable insight into the higher-order organization of HINs and indicates a practical way of selecting useful embedding dimensions. Further, we explore the challenges of combining multiple meta-graphs to capture the multi-dimensional semantics in HIN through reasoning from mathematical geometry and arrive at an embedding compression method of autoencoder with l2,1-loss, which finds the most informative meta-graphs and embeddings in an end-to-end unsupervised manner. Finally, empirical analysis suggests a unified workflow to close the gap between our meta-graph assessment and combination methods. To the best of our knowledge, this is the first research effort to provide rich theoretical and empirical analyses on the utility of meta-graphs and their combinations, especially regarding HIN embedding. Extensive experimental comparisons with various state-of-the-art neural network based embedding methods on multiple real-world HINs demonstrate the effectiveness and efficiency of our framework in finding useful meta-graphs and generating high-quality HIN embeddings.
KW - Heterogeneous data
KW - Network embedding
KW - Spectral analysis
UR - http://www.scopus.com/inward/record.url?scp=85061367378&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061367378&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2018.00081
DO - 10.1109/ICDM.2018.00081
M3 - Conference contribution
AN - SCOPUS:85061367378
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 657
EP - 666
BT - 2018 IEEE International Conference on Data Mining, ICDM 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 November 2018 through 20 November 2018
ER -