TY - GEN
T1 - GAD
T2 - 9th SIAM International Conference on Data Mining 2009, SDM 2009
AU - Xin, Jin
AU - Sangkyum, Kim
AU - Jiawei, Han
AU - Liangliang, Cao
AU - Zhijun, Yin
PY - 2009
Y1 - 2009
N2 - In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (1) Exact GAD algorithm E-GAD, which is much faster than K-Means and gets the same clustering result. (2) Approximate GAD algorithms with different assumptions, which are faster than E-GAD while achieving different degrees of approximation. (3) GAD based algorithms to handle the "large clusters" problem which appears in many large scale clustering applications. Two existing activity detection algorithms GT and CGAUTC are special cases under the framework. The most important contribution of our work is that the framework is the general solution to exploit activity detection for fast clustering in both exact and approximate senarios, and our proposed algorithms within the framework can achieve very high speed. Extensive experiments have been conducted on several large datasets from various real world applications; the results show that our proposed algorithms are effective and efficient.
AB - In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (1) Exact GAD algorithm E-GAD, which is much faster than K-Means and gets the same clustering result. (2) Approximate GAD algorithms with different assumptions, which are faster than E-GAD while achieving different degrees of approximation. (3) GAD based algorithms to handle the "large clusters" problem which appears in many large scale clustering applications. Two existing activity detection algorithms GT and CGAUTC are special cases under the framework. The most important contribution of our work is that the framework is the general solution to exploit activity detection for fast clustering in both exact and approximate senarios, and our proposed algorithms within the framework can achieve very high speed. Extensive experiments have been conducted on several large datasets from various real world applications; the results show that our proposed algorithms are effective and efficient.
UR - http://www.scopus.com/inward/record.url?scp=72849138680&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72849138680&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:72849138680
SN - 9781615671090
T3 - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
SP - 1
EP - 12
BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Y2 - 30 April 2009 through 2 May 2009
ER -