Approximate summaries for why and why-not provenance

Seokki Lee, Bertram Ludäscher, Boris Glavic

Research output: Contribution to journalConference articlepeer-review


Why and why-not provenance have been studied extensively in recent years. However, why-not provenance and - to a lesser degree - why provenance can be very large, resulting in severe scalability and usability challenges. We introduce a novel approximate summarization technique for provenance to address these challenges. Our approach uses patterns to encode why and why-not provenance concisely. We develop techniques for efficiently computing provenance summaries that balance informativeness, conciseness, and completeness. To achieve scalability, we integrate sampling techniques into provenance capture and summarization. Our approach is the first to both scale to large datasets and generate comprehensive and meaningful summaries.

Original languageEnglish (US)
Pages (from-to)912-924
Number of pages13
JournalProceedings of the VLDB Endowment
Issue number6
StatePublished - 2020
Event46th International Conference on Very Large Data Bases, VLDB 2020 - Virtual, Japan
Duration: Aug 31 2020Sep 4 2020

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science


Dive into the research topics of 'Approximate summaries for why and why-not provenance'. Together they form a unique fingerprint.

Cite this