Cross-relational clustering with user's guidance

Xiaoxin Yin, Jiawei Han, Philip S. Yu

Research output: Contribution to conferencePaper

Abstract

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

Original languageEnglish (US)
Pages344-353
Number of pages10
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityChicago, IL
Period8/21/058/24/05

Fingerprint

Data mining
Scalability
Feature extraction
Semantics
Experiments

Keywords

  • Clustering
  • Data mining
  • Relational databases

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Yin, X., Han, J., & Yu, P. S. (2005). Cross-relational clustering with user's guidance. 344-353. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.

Cross-relational clustering with user's guidance. / Yin, Xiaoxin; Han, Jiawei; Yu, Philip S.

2005. 344-353 Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.

Research output: Contribution to conferencePaper

Yin, X, Han, J & Yu, PS 2005, 'Cross-relational clustering with user's guidance', Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States, 8/21/05 - 8/24/05 pp. 344-353.
Yin X, Han J, Yu PS. Cross-relational clustering with user's guidance. 2005. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.
Yin, Xiaoxin ; Han, Jiawei ; Yu, Philip S. / Cross-relational clustering with user's guidance. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.10 p.
@conference{93d180669dec43da8891c9f4ccdbcbb6,
title = "Cross-relational clustering with user's guidance",
abstract = "Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.",
keywords = "Clustering, Data mining, Relational databases",
author = "Xiaoxin Yin and Jiawei Han and Yu, {Philip S.}",
year = "2005",
month = "12",
day = "1",
language = "English (US)",
pages = "344--353",
note = "KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 21-08-2005 Through 24-08-2005",

}

TY - CONF

T1 - Cross-relational clustering with user's guidance

AU - Yin, Xiaoxin

AU - Han, Jiawei

AU - Yu, Philip S.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

AB - Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

KW - Clustering

KW - Data mining

KW - Relational databases

UR - http://www.scopus.com/inward/record.url?scp=32344441804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344441804&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:32344441804

SP - 344

EP - 353

ER -