Cross-relational clustering with user's guidance

Xiaoxin Yin, Jiawei Han, Philip S. Yu

Research output: Contribution to conferencePaper

Abstract

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

Original languageEnglish (US)
Pages344-353
Number of pages10
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityChicago, IL
Period8/21/058/24/05

    Fingerprint

Keywords

  • Clustering
  • Data mining
  • Relational databases

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Yin, X., Han, J., & Yu, P. S. (2005). Cross-relational clustering with user's guidance. 344-353. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.