RelSim: Relation similarity search in schema-rich heterogeneous information networks

Chenguang Wang, Yizhou Sun, Yanglei Song, Jiawei Han, Yangqiu Song, Lidan Wang, Ming Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent studies have demonstrated the power of modeling real world data as heterogeneous information networks (HINs) consisting of multiple types of entities and relations. Unfortunately, most of such studies (e.g., similarity search) confine discussions on the networks with only a few entity and relationship types, such as DBLP. In the real world, however, the network schema can be rather complex, such as Freebase. In such HINs with rich schema, it is often too much burden to ask users to provide explicit guidance in selecting relation-s for similarity search. In this paper, we study the problem of relation similarity search in schema-rich HINs. Under our problem setting, users are only asked to provide some simple relation instance examples (e.g., (Barack Obama, John Kerry) and (George W. Bush, Condoleezza Rice)) as a query, and we automatically detect the latent semantic relation (L-SR) implied by the query (e.g., "president vs. secretary-of-state"). Such LSR will help to find other similar relation instances (e.g., (Bill Clinton, Madeleine Albright)). In order to solve the problem, we first define a new meta-path-based relation similarity measure, RelSim, to measure the similarity between relation instances in schema-rich HINs. Then given a query, we propose an optimization model to efficiently learn LSR implied in the query through linear programming, and perform fast relation similarity search using RelSim based on the learned LSR. The experiments on real world datasets derived from Freebase demonstrate the effectiveness and efficiency of our approach.

Original languageEnglish (US)
Title of host publication16th SIAM International Conference on Data Mining 2016, SDM 2016
EditorsSanjay Chawla Venkatasubramanian, Wagner Meira
PublisherSociety for Industrial and Applied Mathematics Publications
Pages621-629
Number of pages9
ISBN (Electronic)9781510828117
DOIs
StatePublished - 2016
Event16th SIAM International Conference on Data Mining 2016, SDM 2016 - Miami, United States
Duration: May 5 2016May 7 2016

Publication series

Name16th SIAM International Conference on Data Mining 2016, SDM 2016

Other

Other16th SIAM International Conference on Data Mining 2016, SDM 2016
CountryUnited States
CityMiami
Period5/5/165/7/16

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'RelSim: Relation similarity search in schema-rich heterogeneous information networks'. Together they form a unique fingerprint.

Cite this