TY - GEN
T1 - Constraint-based clustering in large databases
AU - Tung, Anthony K.H.
AU - Han, Jiawei
AU - Lakshmanan, Laks V.S.
AU - Ng, Raymond T.
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2001
PY - 2001
Y1 - 2001
N2 - Constrained clustering-finding clusters that satisfy userspecified constraints|is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and effciency of the heuristics.
AB - Constrained clustering-finding clusters that satisfy userspecified constraints|is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and effciency of the heuristics.
UR - http://www.scopus.com/inward/record.url?scp=84949423737&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949423737&partnerID=8YFLogxK
U2 - 10.1007/3-540-44503-x_26
DO - 10.1007/3-540-44503-x_26
M3 - Conference contribution
AN - SCOPUS:84949423737
SN - 9783540414568
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 405
EP - 419
BT - Database Theory - ICDT 2001 - 8th International Conference, Proceedings
A2 - Van den Bussche, Jan
A2 - Vianu, Victor
PB - Springer
T2 - 8th International Conference on Database Theory, ICDT 2001
Y2 - 4 January 2001 through 6 January 2001
ER -