TY - GEN
T1 - Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks
AU - Zhuang, Honglei
AU - Zhang, Jing
AU - Brova, George
AU - Tang, Jie
AU - Cam, Hasan
AU - Yan, Xifeng
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/1/1
Y1 - 2014/1/1
N2 - Mining outliers in a heterogeneous information network is a challenging problem: It is even unclear what should be outliers in a large heterogeneous network (e.g., Outliers in the entire bibliographic network consisting of authors, titles, papers and venues). In this study, we propose an interesting class of outliers, query-based sub network outliers: Given a heterogeneous network, a user raises a query to retrieve a set of task-relevant sub networks, among which, sub network outliers are those that significantly deviate from others (e.g., Outliers of author groups among those studying 'topic modeling'). We formalize this problem and propose a general framework, where one can query for finding sub network outliers with respect to different semantics. We introduce the notion of sub network similarity that captures the proximity between two sub networks by their membership distributions. We propose an outlier detection algorithm to rank all the sub networks according to their outlierness without tuning parameters. Our quantitative and qualitative experiments on both synthetic and real data sets show that the proposed method outperforms other baselines.
AB - Mining outliers in a heterogeneous information network is a challenging problem: It is even unclear what should be outliers in a large heterogeneous network (e.g., Outliers in the entire bibliographic network consisting of authors, titles, papers and venues). In this study, we propose an interesting class of outliers, query-based sub network outliers: Given a heterogeneous network, a user raises a query to retrieve a set of task-relevant sub networks, among which, sub network outliers are those that significantly deviate from others (e.g., Outliers of author groups among those studying 'topic modeling'). We formalize this problem and propose a general framework, where one can query for finding sub network outliers with respect to different semantics. We introduce the notion of sub network similarity that captures the proximity between two sub networks by their membership distributions. We propose an outlier detection algorithm to rank all the sub networks according to their outlierness without tuning parameters. Our quantitative and qualitative experiments on both synthetic and real data sets show that the proposed method outperforms other baselines.
KW - heterogeneous information network
KW - outlier detection
KW - query-based
UR - http://www.scopus.com/inward/record.url?scp=84936971684&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84936971684&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2014.85
DO - 10.1109/ICDM.2014.85
M3 - Conference contribution
AN - SCOPUS:84936971684
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1127
EP - 1132
BT - Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
A2 - Kumar, Ravi
A2 - Toivonen, Hannu
A2 - Pei, Jian
A2 - Zhexue Huang, Joshua
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Data Mining, ICDM 2014
Y2 - 14 December 2014 through 17 December 2014
ER -