Supporting ranking and clustering as generalized order-by and group-by

Chengkai Li, Min Wang, Lipyeow Lim, Haixun Wang, Kevin Chen Chuan Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Boolean semantics of SQL queries cannot adequately capture the "fuzzy" preferences and "soft" criteria required in non-traditional data retrieval applications. One way to solve this problem is to add a flavor of "information retrieval" into database queries by allowing fuzzy query conditions and flexibly supporting grouping and ranking of the query results within the DBMS engine. While ranking is already supported by all major commercial DBMSs natively, support of flexibly grouping is still very limited (i.e., group-by). In this paper, we propose to generalize group-by to enable flexible grouping (clustering specifically) of the query results. Different from clustering in data mining applications, our focus is on supporting efficient clustering of Boolean results generated at query time. Moreover, we propose to integrate ranking and clustering with Boolean conditions, forming a new type of ClusterRank query to allow structured data retrieval. Such an integration is nontrivial in terms of both semantics and query processing. We investigate various semantics of this type of queries. To process such queries, a straightforward approach is to simply glue the techniques developed for ranking-only and clustering-only together. This approach is costly since both ranking and clustering are treated as blocking post-processing tasks upon Boolean query results by existing techniques. We propose a summary-based evaluation method that utilizes bitmap index to seamlessly integrate Boolean conditions, clustering, and ranking. Experimental study shows that our approach significantly outperforms the straightforward one and maintains high clustering quality.

Original languageEnglish (US)
Title of host publicationSIGMOD 2007
Subtitle of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
Pages127-138
Number of pages12
DOIs
StatePublished - Oct 30 2007
EventSIGMOD 2007: ACM SIGMOD International Conference on Management of Data - Beijing, China
Duration: Jun 12 2007Jun 14 2007

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

OtherSIGMOD 2007: ACM SIGMOD International Conference on Management of Data
CountryChina
CityBeijing
Period6/12/076/14/07

Keywords

  • Clustering
  • Data exploration
  • Grouping
  • Query processing
  • Ranking
  • Retrieval
  • Top-k

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Supporting ranking and clustering as generalized order-by and group-by'. Together they form a unique fingerprint.

Cite this