Community distribution outlier detection in heterogeneous information networks

Manish Gupta, Jing Gao, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Heterogeneous networks are ubiquitous. For example, bibliographic data, social data, medical records, movie data and many more can be modeled as heterogeneous networks. Rich information associated with multi-typed nodes in heterogeneous networks motivates us to propose a new definition of outliers, which is different from those defined for homogeneous networks. In this paper, we propose the novel concept of Community Distribution Outliers (CDOutliers) for heterogeneous information networks, which are defined as objects whose community distribution does not follow any of the popular community distribution patterns.We extract such outliers using a type-aware joint analysis of multiple types of objects. Given community membership matrices for all types of objects, we follow an iterative two-stage approach which performs pattern discovery and outlier detection in a tightly integrated manner. We first propose a novel outlier-aware approach based on joint non-negative matrix factorization to discover popular community distribution patterns for all the object types in a holistic manner, and then detect outliers based on such patterns. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting community distribution outliers.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
Pages557-573
Number of pages17
EditionPART 1
DOIs
StatePublished - 2013
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 - Prague, Czech Republic
Duration: Sep 23 2013Sep 27 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume8188 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013
Country/TerritoryCzech Republic
CityPrague
Period9/23/139/27/13

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Community distribution outlier detection in heterogeneous information networks'. Together they form a unique fingerprint.

Cite this