Abstract
Scientific workflows often benefit from or even require advanced modeling constructs, e.g. nesting of subworkflows, cycles for executing loops, data-dependent routing, and pipelined execution. In such settings, an often overlooked aspect of provenance takes center stage: a suitable model of provenance (MoP) for scientific workflows should be based upon the underlying model of computation (MoC) used for executing the workflows. We can derive an adequate MoP from a MoC (such as Kahn's process networks) by taking into account the assumptions that a MoC entails, and by recording the observables which it affords. In this way, a MoP captures or at least better approximates 'real' data dependencies for workflows with advanced modeling constructs. As a specific instance, we elaborate on the Read-Write-ReSet model, a simple and flexible MoP suitable for a number of different MoCs.
Original language | English (US) |
---|---|
Pages (from-to) | 507-518 |
Number of pages | 12 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 20 |
Issue number | 5 |
DOIs | |
State | Published - Apr 10 2008 |
Externally published | Yes |
Keywords
- Computation model
- Provenance
- Scientific workflow
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Computer Networks and Communications
- Computer Science Applications
- Computational Theory and Mathematics