TY - GEN
T1 - Top-K interesting subgraph discovery in information networks
AU - Gupta, Manish
AU - Gao, Jing
AU - Yan, Xifeng
AU - Cam, Hasan
AU - Han, Jiawei
PY - 2014
Y1 - 2014
N2 - In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering top-K subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naïve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.
AB - In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering top-K subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naïve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.
UR - http://www.scopus.com/inward/record.url?scp=84901754968&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901754968&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2014.6816703
DO - 10.1109/ICDE.2014.6816703
M3 - Conference contribution
AN - SCOPUS:84901754968
SN - 9781479925544
T3 - Proceedings - International Conference on Data Engineering
SP - 820
EP - 831
BT - 2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
PB - IEEE Computer Society
T2 - 30th IEEE International Conference on Data Engineering, ICDE 2014
Y2 - 31 March 2014 through 4 April 2014
ER -