Mining top-n local outliers in large databases

Wen Jin, Anthony K.H. Tung, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a novel notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned a Local Outlier Factor (LOF) which represents the likelihood of that object being an outlier. Although the concept of local outliers is a useful one, the computation of LOF values for every data objects requires a large number of k-nearest neighbors searches and can be computationally expensive. Since most objects are usually not outliers, it is useful to provide users with the option of finding only n most outstanding local outliers, i.e., the top-n data objects which are most likely to be local outliers according to their LOFs. However, if the pruning is not done carefully, finding top-n outliers could result in the same amount of computation as finding LOF for all objects. In this paper, we propose a novel method to efficiently find the top-n local outliers in large databases. The concept of "micro-cluster" is introduced to compress the data. An efficient micro-cluster-based local outlier mining algorithm is designed based on this concept. As our algorithm can be adversely affected by the overlapping in the micro-clusters, we proposed a meaningful cut-plane solution for overlapping data. The formal analysis and experiments show that this method can achieve good performance in finding the most outstanding local outliers.

Original languageEnglish (US)
Title of host publicationProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsF. Provost, R. Srikant, M. Schkolnick, D. Lee
PublisherAssociation for Computing Machinery
Pages293-298
Number of pages6
ISBN (Print)158113391X, 9781581133912
DOIs
StatePublished - 2001
Externally publishedYes
EventProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - San Francisco, CA, United States
Duration: Aug 26 2001Aug 29 2001

Publication series

NameProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

OtherProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
Country/TerritoryUnited States
CitySan Francisco, CA
Period8/26/018/29/01

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Mining top-n local outliers in large databases'. Together they form a unique fingerprint.

Cite this