Local learning for mining outlier subgraphs from network datasets

Manish Gupta, Arun Mallya, Subhro Roy, Jason H.D. Cho, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For example for a co-authorship network, given a subgraph containing three authors, one expects all three authors to be say data mining authors. Also, one expects the neighborhood to mostly consist of data mining authors. But a 3-author clique of data mining authors with all theory authors in the neighborhood clearly seems interesting. Similarly, having one of the authors in the clique as a theory author when all other authors (both in the clique and neighborhood) are data mining authors, is also suspicious. Thus, existence of lowprobability links and absence of high-probability links can be a good indicator of subgraph outliemess. The probability of an edge can in turn be modeled based on the weighted similarity between the attribute values of the nodes linked by the edge. We claim that the attribute weights must be learned locally for accurate link existence probability computations. In this paper, we design a system that finds subgraph outliers given a graph and a query by modeling the problem as a linear optimization. Experimental results on several synthetic and real datasets show the effectiveness of the proposed approach in computing interesting outliers.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed J. Zaki, Arindam Banerjee, Srinivasan Parthasarathy, Pang Ning-Tan, Zoran Obradovic, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics Publications
Pages73-81
Number of pages9
ISBN (Electronic)9781510811515
DOIs
StatePublished - 2014
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Publication series

NameSIAM International Conference on Data Mining 2014, SDM 2014
Volume1

Other

Other14th SIAM International Conference on Data Mining, SDM 2014
Country/TerritoryUnited States
CityPhiladelphia
Period4/24/144/26/14

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Local learning for mining outlier subgraphs from network datasets'. Together they form a unique fingerprint.

Cite this