Accelerating approximate aggregation queries with expensive predicates

Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

Research output: Contribution to journalConference articlepeer-review

Abstract

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are expensive and because many applications can tolerate approximate answers, analysts are interested in accelerating these queries via approximations. Unfortunately, standard approximate query processing techniques to accelerate such queries are not applicable because they assume the result of the predicates are available ahead of time. Furthermore, recent work using cheap approximations (i.e., proxies) do not support aggregation queries with predicates. To accelerate aggregation queries with expensive predicates, we develop and analyze a query processing algorithm that leverages proxies (ABae). ABae must account for the key challenge that it may sample records that do not satisfy the predicate. To address this challenge, we first use the proxy to group records into strata so that records satisfying the predicate are ideally grouped into few strata. Given these strata, ABae uses pilot sampling and plugin estimates to sample according to the optimal allocation. We show that ABae converges at an optimal rate in a novel analysis of stratified sampling with draws that may not satisfy the predicate. We further show that ABae outperforms on baselines on six real-world datasets, reducing labeling costs by up to 2.3×.

Original languageEnglish (US)
Pages (from-to)2341-2354
Number of pages14
JournalProceedings of the VLDB Endowment
Volume14
Issue number11
DOIs
StatePublished - 2021
Externally publishedYes
Event47th International Conference on Very Large Data Bases, VLDB 2021 - Virtual, Online
Duration: Aug 16 2021Aug 20 2021

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Accelerating approximate aggregation queries with expensive predicates'. Together they form a unique fingerprint.

Cite this