TY - GEN
T1 - Project histories
T2 - 4th International Workshop on Data Integration in the Life Sciences, DILS 2007
AU - Bowers, Shawn
AU - McPhillips, Timothy
AU - Wu, Martin
AU - Ludäscher, Bertram
PY - 2007
Y1 - 2007
N2 - While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully automated. This paper addresses the need for recording data dependencies across multiple workflow runs and accommodating data management activities performed between runs. We define a new conceptual model for representing project-level provenance based on the notion of project histories and folders, and describe mechanisms to support this model in the collection-oriented modeling and design framework of KEPLER. Our approach allows users to conveniently organize their projects and data using the familiar folder-hierarchy metaphor, while at the same time integrating this information with detailed provenance of data products generated via automated scientific workflows.
AB - While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully automated. This paper addresses the need for recording data dependencies across multiple workflow runs and accommodating data management activities performed between runs. We define a new conceptual model for representing project-level provenance based on the notion of project histories and folders, and describe mechanisms to support this model in the collection-oriented modeling and design framework of KEPLER. Our approach allows users to conveniently organize their projects and data using the familiar folder-hierarchy metaphor, while at the same time integrating this information with detailed provenance of data products generated via automated scientific workflows.
UR - http://www.scopus.com/inward/record.url?scp=34547452653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547452653&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73255-6_12
DO - 10.1007/978-3-540-73255-6_12
M3 - Conference contribution
AN - SCOPUS:34547452653
SN - 3540732543
SN - 9783540732549
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 122
EP - 138
BT - Data Integration in the Life Sciences - 4th International Workshop, DILS 2007, Proceedings
PB - Springer
Y2 - 27 June 2007 through 29 June 2007
ER -