TY - GEN
T1 - RelSim
T2 - 16th SIAM International Conference on Data Mining 2016, SDM 2016
AU - Wang, Chenguang
AU - Sun, Yizhou
AU - Song, Yanglei
AU - Han, Jiawei
AU - Song, Yangqiu
AU - Wang, Lidan
AU - Zhang, Ming
N1 - Funding Information:
Chenguang Wang gratefully acknowledges the support by the National Natural Science Foundation of China (NSFC Grant No. 61472006 and 61272343), the National Basic Research Program (973 Program No. 2014CB340405), and Doctoral Fund of Ministry of Education of China (MOEC RFDP Grant No. 20130001110032). The research is partially supported by the U.S. Army Research Laboratory (AR-L) under agreement W911NF-09-2-0053, and by DARPA under agreement No. FA8750-13-2-0008. Research is also partially sponsored by National Science Foundation LTS-1017362, IIS-1320617, IIS-1354329, HDTRA1-10-1-0120, CAREER No. 1453800 and Northeastern TIER 1, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov), and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.
Publisher Copyright:
Copyright © by SIAM.
PY - 2016
Y1 - 2016
N2 - Recent studies have demonstrated the power of modeling real world data as heterogeneous information networks (HINs) consisting of multiple types of entities and relations. Unfortunately, most of such studies (e.g., similarity search) confine discussions on the networks with only a few entity and relationship types, such as DBLP. In the real world, however, the network schema can be rather complex, such as Freebase. In such HINs with rich schema, it is often too much burden to ask users to provide explicit guidance in selecting relation-s for similarity search. In this paper, we study the problem of relation similarity search in schema-rich HINs. Under our problem setting, users are only asked to provide some simple relation instance examples (e.g., (Barack Obama, John Kerry) and (George W. Bush, Condoleezza Rice)) as a query, and we automatically detect the latent semantic relation (L-SR) implied by the query (e.g., "president vs. secretary-of-state"). Such LSR will help to find other similar relation instances (e.g., (Bill Clinton, Madeleine Albright)). In order to solve the problem, we first define a new meta-path-based relation similarity measure, RelSim, to measure the similarity between relation instances in schema-rich HINs. Then given a query, we propose an optimization model to efficiently learn LSR implied in the query through linear programming, and perform fast relation similarity search using RelSim based on the learned LSR. The experiments on real world datasets derived from Freebase demonstrate the effectiveness and efficiency of our approach.
AB - Recent studies have demonstrated the power of modeling real world data as heterogeneous information networks (HINs) consisting of multiple types of entities and relations. Unfortunately, most of such studies (e.g., similarity search) confine discussions on the networks with only a few entity and relationship types, such as DBLP. In the real world, however, the network schema can be rather complex, such as Freebase. In such HINs with rich schema, it is often too much burden to ask users to provide explicit guidance in selecting relation-s for similarity search. In this paper, we study the problem of relation similarity search in schema-rich HINs. Under our problem setting, users are only asked to provide some simple relation instance examples (e.g., (Barack Obama, John Kerry) and (George W. Bush, Condoleezza Rice)) as a query, and we automatically detect the latent semantic relation (L-SR) implied by the query (e.g., "president vs. secretary-of-state"). Such LSR will help to find other similar relation instances (e.g., (Bill Clinton, Madeleine Albright)). In order to solve the problem, we first define a new meta-path-based relation similarity measure, RelSim, to measure the similarity between relation instances in schema-rich HINs. Then given a query, we propose an optimization model to efficiently learn LSR implied in the query through linear programming, and perform fast relation similarity search using RelSim based on the learned LSR. The experiments on real world datasets derived from Freebase demonstrate the effectiveness and efficiency of our approach.
UR - http://www.scopus.com/inward/record.url?scp=84991687322&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991687322&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974348.70
DO - 10.1137/1.9781611974348.70
M3 - Conference contribution
AN - SCOPUS:84991687322
T3 - 16th SIAM International Conference on Data Mining 2016, SDM 2016
SP - 621
EP - 629
BT - 16th SIAM International Conference on Data Mining 2016, SDM 2016
A2 - Venkatasubramanian, Sanjay Chawla
A2 - Meira, Wagner
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 5 May 2016 through 7 May 2016
ER -