Constraint-based clustering in large databases

Anthony K.H. Tung, Jiawei Han, Laks V.S. Lakshmanan, Raymond T. Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Constrained clustering-finding clusters that satisfy userspecified constraints|is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and effciency of the heuristics.

Original languageEnglish (US)
Title of host publicationDatabase Theory - ICDT 2001 - 8th International Conference, Proceedings
EditorsJan Van den Bussche, Victor Vianu
Number of pages15
ISBN (Print)9783540414568
StatePublished - 2001
Externally publishedYes
Event8th International Conference on Database Theory, ICDT 2001 - London, United Kingdom
Duration: Jan 4 2001Jan 6 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other8th International Conference on Database Theory, ICDT 2001
Country/TerritoryUnited Kingdom

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Constraint-based clustering in large databases'. Together they form a unique fingerprint.

Cite this