TY - GEN
T1 - ProPub
T2 - 23rd International Conference on Scientific and Statistical Database Management, SSDBM 2011
AU - Dey, Saumen C.
AU - Zinn, Daniel
AU - Ludäscher, Bertram
PY - 2011
Y1 - 2011
N2 - Data provenance, i.e., the lineage and processing history of data, is becoming increasingly important in scientific applications. Provenance information can be used, e.g., to explain, debug, and reproduce the results of computational experiments, or to determine the validity and quality of data products. In collaborative science settings, it may be infeasible or undesirable to publish the complete provenance of a data product. We develop a framework that allows data publishers to "customize" provenance data prior to exporting it. For example, users can specify which parts of the provenance graph are to be included in the result and which parts should be hidden, anonymized, or abstracted. However, such user-defined provenance customization needs to be carefully counterbalanced with the need to faithfully report all relevant data and process dependencies. To this end, we propose ProPub (Provenance Publisher), a framework and system which allows the user (i) to state provenance publication and customization requests, (ii) to specify provenance policies that should be obeyed, (iii) to check whether the policies are satisfied, and (iv) to repair policy violations and reconcile conflicts between user requests and provenance policies should they occur. In the ProPub approach, policies as well as customization requests are expressed as logic rules. By using a declarative, logic-based framework, ProPub can first check and then enforce integrity constraints (ICs), e.g., by rejecting inconsistent user requests, or by repairing violated ICs according to a given conflict resolution strategy.
AB - Data provenance, i.e., the lineage and processing history of data, is becoming increasingly important in scientific applications. Provenance information can be used, e.g., to explain, debug, and reproduce the results of computational experiments, or to determine the validity and quality of data products. In collaborative science settings, it may be infeasible or undesirable to publish the complete provenance of a data product. We develop a framework that allows data publishers to "customize" provenance data prior to exporting it. For example, users can specify which parts of the provenance graph are to be included in the result and which parts should be hidden, anonymized, or abstracted. However, such user-defined provenance customization needs to be carefully counterbalanced with the need to faithfully report all relevant data and process dependencies. To this end, we propose ProPub (Provenance Publisher), a framework and system which allows the user (i) to state provenance publication and customization requests, (ii) to specify provenance policies that should be obeyed, (iii) to check whether the policies are satisfied, and (iv) to repair policy violations and reconcile conflicts between user requests and provenance policies should they occur. In the ProPub approach, policies as well as customization requests are expressed as logic rules. By using a declarative, logic-based framework, ProPub can first check and then enforce integrity constraints (ICs), e.g., by rejecting inconsistent user requests, or by repairing violated ICs according to a given conflict resolution strategy.
UR - http://www.scopus.com/inward/record.url?scp=79961182683&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79961182683&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-22351-8_13
DO - 10.1007/978-3-642-22351-8_13
M3 - Conference contribution
AN - SCOPUS:79961182683
SN - 9783642223501
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 225
EP - 243
BT - Scientific and Statistical Database Management - 23rd International Conference, SSDBM 2011, Proceedings
Y2 - 20 July 2011 through 22 July 2011
ER -