Abstract
Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CROSSCLUS, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CROSSCLUS is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CROSSCLUS extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.
Original language | English (US) |
---|---|
Pages | 344-353 |
Number of pages | 10 |
DOIs | |
State | Published - 2005 |
Event | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States Duration: Aug 21 2005 → Aug 24 2005 |
Other
Other | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Country/Territory | United States |
City | Chicago, IL |
Period | 8/21/05 → 8/24/05 |
Keywords
- Clustering
- Data mining
- Relational databases
ASJC Scopus subject areas
- Software
- Information Systems