Retrospective provenance without a runtime provenance recorder

Timothy McPhillips, Shawn Bowers, Khalid Belhajjame, Bertram Ludäscher

Research output: Contribution to conferencePaperpeer-review

Abstract

The YesWorkflow (YW) toolkit aims to provide users of scripting languages such as Python, Perl, and R with many of the benefits of scientific workflow automation. YW requires neither the use of a workflow engine nor the overhead of adapting or instrumenting code to run in such a system. Instead, YW enables scientists to annotate their scripts with special comments that reveal the main computational blocks and dataflow dependencies otherwise implicit in scripts. YW tools extract and analyze these comments, represent scripts in terms of entities based on a typical scientific workflow model, and provide graphical workflow views (i.e., prospective provenance) of scripts. In this paper, we present a new extension of YW for inferring retrospective provenance from script executions without relying on a runtime provenance recorder. Instead we exploit the common practice of scientists to embed important pieces of provenance in directory structures and file names. For such “provenance-friendly” data organizations, we offer a new annotation mechanism based on URI templates. YW uses these to link conceptual-level prospective provenance with data files created at runtime, resulting in a powerful, integrated model of prospective and retrospective provenance. We present scientifically meaningful retrospective provenance queries for investigating an execution of a data acquisition workflow implemented as a Python script, and show how these queries can be evaluated using the YW toolkit.

Original languageEnglish (US)
StatePublished - 2015
Event7th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2015 - Edinburgh, United Kingdom
Duration: Jul 8 2015Jul 9 2015

Conference

Conference7th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2015
Country/TerritoryUnited Kingdom
CityEdinburgh
Period7/8/157/9/15

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Retrospective provenance without a runtime provenance recorder'. Together they form a unique fingerprint.

Cite this