Cross-relational clustering with user's guidance

Xiaoxin Yin, Jiawei Han, Gabrielle Dawn Allen

Research output: Contribution to conferencePaperpeer-review

Abstract

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

Original languageEnglish (US)
Pages344-353
Number of pages10
DOIs
StatePublished - 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CityChicago, IL
Period8/21/058/24/05

Keywords

  • Clustering
  • Data mining
  • Relational databases

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Cross-relational clustering with user's guidance'. Together they form a unique fingerprint.

Cite this