TY - GEN
T1 - Ranking outliers using symmetric neighborhood relationship
AU - Jin, Wen
AU - Tung, Anthony K.H.
AU - Han, Jiawei
AU - Wang, Wei
PY - 2006
Y1 - 2006
N2 - Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2, 11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a. result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.
AB - Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2, 11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a. result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.
UR - http://www.scopus.com/inward/record.url?scp=33745772192&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745772192&partnerID=8YFLogxK
U2 - 10.1007/11731139_68
DO - 10.1007/11731139_68
M3 - Conference contribution
AN - SCOPUS:33745772192
SN - 3540332065
SN - 9783540332060
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 577
EP - 593
BT - Advances in Knowledge Discovery and Data Mining - 10th Pacific-Asia Conference, PAKDD 2006, Proceedings
T2 - 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2006
Y2 - 9 April 2006 through 12 April 2006
ER -