TY - GEN
T1 - PReP
T2 - 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
AU - Shi, Yu
AU - Chan, Po Wei
AU - Zhuang, Honglei
AU - Gui, Huan
AU - Han, Jiawei
N1 - Funding Information:
Acknowledgments. We thank our colleagues and friends for the enlightening discussions: Jason Jian Ge, Jiasen Yang, Carl Ji Yang, and many members of the Data Mining Group at UIUC. We also thank the anonymous reviewers for their insightful comments. Œis work was sponsored in part by the U.S. Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), National Science Foundation IIS-1320617 and IIS 16-18481, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov). Œe views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the ocial policies of the U.S. Army Research Laboratory or the U.S. Government. Œe U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
Publisher Copyright:
© 2017 Copyright held by the owner/author(s).
PY - 2017/8/13
Y1 - 2017/8/13
N2 - As a powerful representation paradigm for networked and multi-typed data, the heterogeneous information network (HIN) is ubiquitous. Meanwhile, defining proper relevance measures has always been a fundamental problem and of great pragmatic importance for network mining tasks. Inspired by our probabilistic interpretation of existing path-based relevance measures, we propose to study HIN relevance from a probabilistic perspective. We also identify, from real-world data, and propose to model cross-meta-path synergy, which is a characteristic important for defining path-based HIN relevance and has not been modeled by existing methods. A generative model is established to derive a novel path-based relevance measure, which is data-driven and tailored for each HIN. We develop an inference algorithm to find the maximum a posteriori (MAP) estimate of the model parameters, which entails non-trivial tricks. Experiments on two real-world datasets demonstrate the effectiveness of the proposed model and relevance measure.
AB - As a powerful representation paradigm for networked and multi-typed data, the heterogeneous information network (HIN) is ubiquitous. Meanwhile, defining proper relevance measures has always been a fundamental problem and of great pragmatic importance for network mining tasks. Inspired by our probabilistic interpretation of existing path-based relevance measures, we propose to study HIN relevance from a probabilistic perspective. We also identify, from real-world data, and propose to model cross-meta-path synergy, which is a characteristic important for defining path-based HIN relevance and has not been modeled by existing methods. A generative model is established to derive a novel path-based relevance measure, which is data-driven and tailored for each HIN. We develop an inference algorithm to find the maximum a posteriori (MAP) estimate of the model parameters, which entails non-trivial tricks. Experiments on two real-world datasets demonstrate the effectiveness of the proposed model and relevance measure.
KW - Graph mining
KW - Heterogeneous information networks
KW - Meta-paths
KW - Relevance measures
UR - http://www.scopus.com/inward/record.url?scp=85029025692&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029025692&partnerID=8YFLogxK
U2 - 10.1145/3097983.3097990
DO - 10.1145/3097983.3097990
M3 - Conference contribution
C2 - 30221026
AN - SCOPUS:85029025692
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 425
EP - 434
BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 13 August 2017 through 17 August 2017
ER -