Project histories: Managing data provenance across collection-oriented scientific workflow runs

Shawn Bowers, Timothy McPhillips, Martin Wu, Bertram Ludäscher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully automated. This paper addresses the need for recording data dependencies across multiple workflow runs and accommodating data management activities performed between runs. We define a new conceptual model for representing project-level provenance based on the notion of project histories and folders, and describe mechanisms to support this model in the collection-oriented modeling and design framework of KEPLER. Our approach allows users to conveniently organize their projects and data using the familiar folder-hierarchy metaphor, while at the same time integrating this information with detailed provenance of data products generated via automated scientific workflows.

Original languageEnglish (US)
Title of host publicationData Integration in the Life Sciences - 4th International Workshop, DILS 2007, Proceedings
PublisherSpringer
Pages122-138
Number of pages17
ISBN (Print)3540732543, 9783540732549
DOIs
StatePublished - 2007
Externally publishedYes
Event4th International Workshop on Data Integration in the Life Sciences, DILS 2007 - Philadelphia, PA, United States
Duration: Jun 27 2007Jun 29 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4544 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other4th International Workshop on Data Integration in the Life Sciences, DILS 2007
Country/TerritoryUnited States
CityPhiladelphia, PA
Period6/27/076/29/07

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Project histories: Managing data provenance across collection-oriented scientific workflow runs'. Together they form a unique fingerprint.

Cite this