Distant meta-path similarities for text-based heterogeneous information networks

Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun, Ming Zhang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1

Original languageEnglish (US)
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1629-1638
Number of pages10
ISBN (Electronic)9781450349185
DOIs
StatePublished - Nov 6 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: Nov 6 2017Nov 10 2017

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
VolumePart F131841

Other

Other26th ACM International Conference on Information and Knowledge Management, CIKM 2017
CountrySingapore
CitySingapore
Period11/6/1711/10/17

Fingerprint

Information networks
Proximity
Leverage
Similarity measure
Art
Intuition
Semantic network
Data mining

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Wang, C., Song, Y., Li, H., Sun, Y., Zhang, M., & Han, J. (2017). Distant meta-path similarities for text-based heterogeneous information networks. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management (pp. 1629-1638). (International Conference on Information and Knowledge Management, Proceedings; Vol. Part F131841). Association for Computing Machinery. https://doi.org/10.1145/3132847.3133029

Distant meta-path similarities for text-based heterogeneous information networks. / Wang, Chenguang; Song, Yangqiu; Li, Haoran; Sun, Yizhou; Zhang, Ming; Han, Jiawei.

CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Association for Computing Machinery, 2017. p. 1629-1638 (International Conference on Information and Knowledge Management, Proceedings; Vol. Part F131841).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, C, Song, Y, Li, H, Sun, Y, Zhang, M & Han, J 2017, Distant meta-path similarities for text-based heterogeneous information networks. in CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. International Conference on Information and Knowledge Management, Proceedings, vol. Part F131841, Association for Computing Machinery, pp. 1629-1638, 26th ACM International Conference on Information and Knowledge Management, CIKM 2017, Singapore, Singapore, 11/6/17. https://doi.org/10.1145/3132847.3133029
Wang C, Song Y, Li H, Sun Y, Zhang M, Han J. Distant meta-path similarities for text-based heterogeneous information networks. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Association for Computing Machinery. 2017. p. 1629-1638. (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/3132847.3133029
Wang, Chenguang ; Song, Yangqiu ; Li, Haoran ; Sun, Yizhou ; Zhang, Ming ; Han, Jiawei. / Distant meta-path similarities for text-based heterogeneous information networks. CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Association for Computing Machinery, 2017. pp. 1629-1638 (International Conference on Information and Knowledge Management, Proceedings).
@inproceedings{5c7504382e8d496c9f15f576e3a7b825,
title = "Distant meta-path similarities for text-based heterogeneous information networks",
abstract = "Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1",
author = "Chenguang Wang and Yangqiu Song and Haoran Li and Yizhou Sun and Ming Zhang and Jiawei Han",
year = "2017",
month = "11",
day = "6",
doi = "10.1145/3132847.3133029",
language = "English (US)",
series = "International Conference on Information and Knowledge Management, Proceedings",
publisher = "Association for Computing Machinery",
pages = "1629--1638",
booktitle = "CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management",

}

TY - GEN

T1 - Distant meta-path similarities for text-based heterogeneous information networks

AU - Wang, Chenguang

AU - Song, Yangqiu

AU - Li, Haoran

AU - Sun, Yizhou

AU - Zhang, Ming

AU - Han, Jiawei

PY - 2017/11/6

Y1 - 2017/11/6

N2 - Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1

AB - Measuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the heterogeneous information networks (HINs) with rich semantics are ubiquitous. However, the existing network similarity doesn't generalize well in HINs because they fail to capture the HIN semantics. The meta-path has been proposed and demonstrated as a right way to represent semantics in HINs. Therefore, original meta-path based similarities (e.g., PathSim and KnowSim) have been successful in computing the entity proximity in HINs. The intuition is that the more instances of meta-path(s) between entities, the more similar the entities are. Thus the original meta-path similarity only applies to computing the proximity of two neighborhood (connected) entities. In this paper, we propose the distant meta-path similarity that is able to capture HIN semantics between two distant (isolated) entities to provide more meaningful entity proximity. The main idea is that even there is no shared neighborhood entities of (i.e., no meta-path instances connecting) the two entities, but if the more similar neighborhood entities of the entities are, the more similar the two entities should be. We then find out the optimum distant meta-path similarity by exploring the similarity hypothesis space based on different theoretical foundations. We show the state-ofthe-art similarity performance of distant meta-path similarity on two text-based HINs and make the datasets public available.1

UR - http://www.scopus.com/inward/record.url?scp=85037378598&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037378598&partnerID=8YFLogxK

U2 - 10.1145/3132847.3133029

DO - 10.1145/3132847.3133029

M3 - Conference contribution

AN - SCOPUS:85037378598

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 1629

EP - 1638

BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management

PB - Association for Computing Machinery

ER -