Linking prospective and retrospective provenance in scripts

Saumen Dey, Khalid Belhajjame, David Koop, Meghan Raul, Bertram Ludäscher

Research output: Contribution to conferencePaperpeer-review


Scripting languages like Python, R, and MATLAB have seen significant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the provenance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by identifying the data inputs and the processing steps that were used to produce it. By and large, the provenance information captured for scripts is fine-grained in the sense that it captures data dependencies at the level of script statement, and do so for every variable within the script. While useful, the amount of recorded provenance information can be overwhelming for users and cumbersome to use. This suggests the need for abstraction mechanisms that focus attention on specific parts of provenance relevant for analyses. Toward this goal, we propose that fine-grained provenance information recorded as the result of script execution can be abstracted using user-specified, workflow-like views. Specifically, we show how the provenance traces recorded by noWorkflow can be mapped to the workflow specifications generated by YesWorkflow from scripts based on user annotations. We examine the issues in constructing a successful mapping, provide an initial implementation of our solution, and present competency queries illustrating how a workflow view generated from the script can be used to explore the provenance recorded during script execution.

Original languageEnglish (US)
StatePublished - 2015
Event7th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2015 - Edinburgh, United Kingdom
Duration: Jul 8 2015Jul 9 2015


Conference7th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2015
Country/TerritoryUnited Kingdom

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Linking prospective and retrospective provenance in scripts'. Together they form a unique fingerprint.

Cite this