TY - GEN
T1 - Distant meta-path similarities for text-based heterogeneous information networks
AU - Wang, Chenguang
AU - Song, Yangqiu
AU - Li, Haoran
AU - Sun, Yizhou
AU - Zhang, Ming
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1
AB - Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1
UR - http://www.scopus.com/inward/record.url?scp=85037378598&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85037378598&partnerID=8YFLogxK
U2 - 10.1145/3132847.3133029
DO - 10.1145/3132847.3133029
M3 - Conference contribution
AN - SCOPUS:85037378598
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1629
EP - 1638
BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Y2 - 6 November 2017 through 10 November 2017
ER -