Probabilistic top-k and ranking-aggregate queries

Mohamed A. Soliman, Ihab F. Ilyas, Kevin Chen Chuan Chang

Research output: Contribution to journalArticlepeer-review

Abstract

Ranking and aggregation queries are widely used in data exploration, data analysis, and decision-making scenarios. While most of the currently proposed ranking and aggregation techniques focus on deterministic data, several emerging applications involve data that is unclean or uncertain. Ranking and aggregating uncertain (probabilistic) data raises new challenges in query semantics and processing, making conventional methods inapplicable. Furthermore, uncertainty imposes probability as a new ranking dimension that does not exist in the traditional settings. In this article we introduce new probabilistic formulations for top-k and ranking-aggregate queries in probabilistic databases. Our formulations are based on marriage of traditional top-k semantics with possible worlds semantics. In the light of these formulations, we construct a generic processing framework supporting both query types, and leveraging existing query processing and indexing capabilities in current RDBMSs. The framework encapsulates a state space model and efficient search algorithms to compute query answers. Our proposed techniques minimize the number of accessed tuples and the size of materialized search space to compute query answers. Our experimental study shows the efficiency of our techniques under different data distributions with orders of magnitude improvement over nave methods.

Original languageEnglish (US)
Article number13
JournalACM Transactions on Database Systems
Volume33
Issue number3
DOIs
StatePublished - Aug 1 2008

Keywords

  • Aggregation
  • Probabilistic data
  • Query processing
  • Ranking
  • Top-k

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Probabilistic top-k and ranking-aggregate queries'. Together they form a unique fingerprint.

Cite this