TY - GEN
T1 - Local learning for mining outlier subgraphs from network datasets
AU - Gupta, Manish
AU - Mallya, Arun
AU - Roy, Subhro
AU - Cho, Jason H.D.
AU - Han, Jiawei
N1 - Publisher Copyright:
© SIAM.
PY - 2014
Y1 - 2014
N2 - In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outliemess. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers.
AB - In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outliemess. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers.
UR - http://www.scopus.com/inward/record.url?scp=84909636779&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84909636779&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973440.9
DO - 10.1137/1.9781611973440.9
M3 - Conference contribution
AN - SCOPUS:84909636779
T3 - SIAM International Conference on Data Mining 2014, SDM 2014
SP - 73
EP - 81
BT - SIAM International Conference on Data Mining 2014, SDM 2014
A2 - Zaki, Mohammed J.
A2 - Banerjee, Arindam
A2 - Parthasarathy, Srinivasan
A2 - Ning-Tan, Pang
A2 - Obradovic, Zoran
A2 - Kamath, Chandrika
PB - Society for Industrial and Applied Mathematics Publications
T2 - 14th SIAM International Conference on Data Mining, SDM 2014
Y2 - 24 April 2014 through 26 April 2014
ER -