TY - GEN

T1 - Local learning for mining outlier subgraphs from network datasets

AU - Gupta, Manish

AU - Mallya, Arun

AU - Roy, Subhro

AU - Cho, Jason H.D.

AU - Han, Jiawei

PY - 2014/1/1

Y1 - 2014/1/1

N2 - In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outliemess. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers. Copyright

AB - In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outliemess. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers. Copyright

UR - http://www.scopus.com/inward/record.url?scp=84909636779&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84909636779&partnerID=8YFLogxK

U2 - 10.1137/1.9781611973440.9

DO - 10.1137/1.9781611973440.9

M3 - Conference contribution

AN - SCOPUS:84909636779

T3 - SIAM International Conference on Data Mining 2014, SDM 2014

SP - 73

EP - 81

BT - SIAM International Conference on Data Mining 2014, SDM 2014

A2 - Zaki, Mohammed J.

A2 - Banerjee, Arindam

A2 - Parthasarathy, Srinivasan

A2 - Ning-Tan, Pang

A2 - Obradovic, Zoran

A2 - Kamath, Chandrika

PB - Society for Industrial and Applied Mathematics Publications

T2 - 14th SIAM International Conference on Data Mining, SDM 2014

Y2 - 24 April 2014 through 26 April 2014

ER -