Integrating approximate summarization with provenance capture

Seokki Lee, Xing Niu, Bertram Ludäscher, Boris Glavic

Research output: Contribution to conferencePaperpeer-review

Abstract

How to use provenance to explain why a query returns a result or why a result is missing has been studied extensively. Recently, we have demonstrated how to uniformly answer these types of provenance questions for first-order queries with negation and have presented an implementation of this approach in our PUG (Provenance Unification through Graphs) system. However, for realisticallysized databases, the provenance of answers and missing answers can be very large, overwhelming the user with too much information and wasting computational resources. In this paper, we introduce an (approximate) summarization technique that generates compact representations of why and why-not provenance. Our technique uses patterns as a summarized representation of sets of elements from the provenance, i.e., successful or failed derivations. We rank these patterns based on their descriptiveness (we use precision and recall as quality measures for patterns) and return only the top-k summaries. We demonstrate how this summarization technique can be integrated with provenance capture to compute summaries on demand and how sampling techniques can be employed to speed up both the summarization and capture steps. Our preliminary experiments demonstrate that this summarization technique scales to large instances of a real-world dataset.

Original languageEnglish (US)
StatePublished - 2017
Event9th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2017 - Seattle, United States
Duration: Jun 23 2017 → …

Conference

Conference9th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2017
CountryUnited States
CitySeattle
Period6/23/17 → …

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Integrating approximate summarization with provenance capture'. Together they form a unique fingerprint.

Cite this