TY - GEN
T1 - Filtering and refinement
T2 - 9th IEEE International Conference on Data Mining, ICDM 2009
AU - Yu, Xiao
AU - Tang, Lu An
AU - Han, Jiawei
PY - 2009
Y1 - 2009
N2 - Anomaly detection is an important data mining task. Most existing methods treat anomalies as inconsistencies and spend the majority amount of time on modeling normal instances. A recently proposed, sampling-based approach may substantially boost the efficiency in anomaly detection but may also lead to weaker accuracy and robustness. In this study, we propose a two-stage approach to find anomalies in complex datasets with high accuracy as well as low time complexity and space cost. Instead of analyzing normal instances, our algorithm first employs an efficient deterministic space partition algorithm to eliminate obvious normal instances and generates a small set of anomaly candidates with a single scan of the dataset. It then checks each candidate with density-based multiple criteria to determine the final results. This two-stage framework also detects anomalies of different notions. Our experiments show that this new approach finds anomalies successfully in different conditions and ensures a good balance of efficiency, accuracy, and robustness.
AB - Anomaly detection is an important data mining task. Most existing methods treat anomalies as inconsistencies and spend the majority amount of time on modeling normal instances. A recently proposed, sampling-based approach may substantially boost the efficiency in anomaly detection but may also lead to weaker accuracy and robustness. In this study, we propose a two-stage approach to find anomalies in complex datasets with high accuracy as well as low time complexity and space cost. Instead of analyzing normal instances, our algorithm first employs an efficient deterministic space partition algorithm to eliminate obvious normal instances and generates a small set of anomaly candidates with a single scan of the dataset. It then checks each candidate with density-based multiple criteria to determine the final results. This two-stage framework also detects anomalies of different notions. Our experiments show that this new approach finds anomalies successfully in different conditions and ensures a good balance of efficiency, accuracy, and robustness.
UR - http://www.scopus.com/inward/record.url?scp=77951184519&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951184519&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2009.44
DO - 10.1109/ICDM.2009.44
M3 - Conference contribution
AN - SCOPUS:77951184519
SN - 9780769538952
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 617
EP - 626
BT - ICDM 2009 - The 9th IEEE International Conference on Data Mining
Y2 - 6 December 2009 through 9 December 2009
ER -