TY - GEN
T1 - A SQL-Middleware unifying why and why-not provenance for first-order queries
AU - Lee, Seokki
AU - Köhler, Sven
AU - Ludäscher, Bertram
AU - Glavic, Boris
N1 - Work supported in part by NSF awards SMA-1637155 and ACI-1541450.
PY - 2017/5/16
Y1 - 2017/5/16
N2 - Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we present the first practical approach for answering such questions for queries with negation (firstorder queries). Our approach is based on a rewriting of Datalog rules (called firing rules) that captures successful rule derivations within the context of a Datalog query. We extend this rewriting to support negation and to capture failed derivations that explain missing answers. Given a (why or why-not) provenance question, we compute an explanation, i.e., the part of the provenance that is relevant to answer the question. We introduce optimizations that prune parts of a provenance graph early on if we can determine that they will not be part of the explanation for a given question. We present an implementation that runs on top of a relational database using SQL to compute explanations. Our experiments demonstrate that our approach scales to large instances and significantly outperforms an earlier approach which instantiates the full provenance to compute explanations.
AB - Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we present the first practical approach for answering such questions for queries with negation (firstorder queries). Our approach is based on a rewriting of Datalog rules (called firing rules) that captures successful rule derivations within the context of a Datalog query. We extend this rewriting to support negation and to capture failed derivations that explain missing answers. Given a (why or why-not) provenance question, we compute an explanation, i.e., the part of the provenance that is relevant to answer the question. We introduce optimizations that prune parts of a provenance graph early on if we can determine that they will not be part of the explanation for a given question. We present an implementation that runs on top of a relational database using SQL to compute explanations. Our experiments demonstrate that our approach scales to large instances and significantly outperforms an earlier approach which instantiates the full provenance to compute explanations.
UR - http://www.scopus.com/inward/record.url?scp=85021225834&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021225834&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2017.105
DO - 10.1109/ICDE.2017.105
M3 - Conference contribution
AN - SCOPUS:85021225834
T3 - Proceedings - International Conference on Data Engineering
SP - 485
EP - 496
BT - Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
PB - IEEE Computer Society
T2 - 33rd IEEE International Conference on Data Engineering, ICDE 2017
Y2 - 19 April 2017 through 22 April 2017
ER -