Abstract
Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1
Original language | English (US) |
---|---|
Title of host publication | CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management |
Publisher | Association for Computing Machinery |
Pages | 1629-1638 |
Number of pages | 10 |
ISBN (Electronic) | 9781450349185 |
DOIs | |
State | Published - Nov 6 2017 |
Event | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore Duration: Nov 6 2017 → Nov 10 2017 |
Publication series
Name | International Conference on Information and Knowledge Management, Proceedings |
---|---|
Volume | Part F131841 |
Other
Other | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 |
---|---|
Country | Singapore |
City | Singapore |
Period | 11/6/17 → 11/10/17 |
Fingerprint
ASJC Scopus subject areas
- Business, Management and Accounting(all)
- Decision Sciences(all)
Cite this
Distant meta-path similarities for text-based heterogeneous information networks. / Wang, Chenguang; Song, Yangqiu; Li, Haoran; Sun, Yizhou; Zhang, Ming; Han, Jiawei.
CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Association for Computing Machinery, 2017. p. 1629-1638 (International Conference on Information and Knowledge Management, Proceedings; Vol. Part F131841).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Distant meta-path similarities for text-based heterogeneous information networks
AU - Wang, Chenguang
AU - Song, Yangqiu
AU - Li, Haoran
AU - Sun, Yizhou
AU - Zhang, Ming
AU - Han, Jiawei
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1
AB - Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1
UR - http://www.scopus.com/inward/record.url?scp=85037378598&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85037378598&partnerID=8YFLogxK
U2 - 10.1145/3132847.3133029
DO - 10.1145/3132847.3133029
M3 - Conference contribution
AN - SCOPUS:85037378598
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1629
EP - 1638
BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
ER -