Ranking outliers using symmetric neighborhood relationship

Wen Jin, Anthony K.H. Tung, Jiawei Han, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2, 11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a. result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

Original languageEnglish (US)
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 10th Pacific-Asia Conference, PAKDD 2006, Proceedings
Pages577-593
Number of pages17
DOIs
StatePublished - 2006
Event10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2006 - Singapore, Singapore
Duration: Apr 9 2006Apr 12 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3918 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2006
Country/TerritorySingapore
CitySingapore
Period4/9/064/12/06

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Ranking outliers using symmetric neighborhood relationship'. Together they form a unique fingerprint.

Cite this