Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs

Manish Kumar Anand, Shawn Bowers, Timothy McPhillips, Bertram Ludäscher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Existing approaches for representing the provenance of scientific workflow runs largely ignore computation models that work over structured data, including XML. Unlike models based on transformation semantics, these computation models often employ update semantics, in which only a portion of an incoming XML stream is modified by each workflow step. Applying conventional provenance approaches to such models results in provenance information that is either too coarse (e.g., stating that one version of an XML document depends entirely on a prior version) or potentially incorrect (e.g., stating that each element of an XML document depends on every element in a prior version). We describe a generic provenance model that naturally represents workflow runs involving processes that work over nested data collections and that employ update semantics. Moreover, we extend current query approaches to support our model, enabling queries to be posed not only over data lineage relationships, but also over versions of nested data structures produced during a workflow run. We show how hybrid queries can be expressed against our model using high-level query constructs and implemented efficiently over relational provenance storage schemes.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings
Pages237-254
Number of pages18
DOIs
StatePublished - Aug 27 2009
Externally publishedYes
Event21st International Conference on Scientific and Statistical Database Management, SSDBM 2009 - New Orleans, LA, United States
Duration: Jun 2 2009Jun 4 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5566 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st International Conference on Scientific and Statistical Database Management, SSDBM 2009
CountryUnited States
CityNew Orleans, LA
Period6/2/096/4/09

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs'. Together they form a unique fingerprint.

Cite this