Collection-oriented scientific workflows for integrating and analyzing biological data

Timothy McPhillips, Shawn Bowers, Bertram Ludäscher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Steps in scientific workflows often generate collections of results, causing the data flowing through workflows to become increasingly nested. Because conventional workflow components (or actors) typically operate on simple or application-specific data types, additional actors often are required to manage these nested data collections. As a result, conventional workflows become increasingly complex as data becomes more nested. This paper describes a new paradigm for developing scientific workflows that transparently manages nested data collections. Collection-oriented workflows have a number of advantages over conventional approaches including simpler workflow designs (e.g., requiring fewer actors and control-flow constructs) that are invariant under changes in data nesting. Our implementation within the KEPLER scientific workflow system enables the explicit representation of collections and collection schemas, concurrent operation over collection contents via multi-level pipeline parallelism, and allows collection-aware actors to be composed readily from conventional actors.

Original languageEnglish (US)
Title of host publicationData Integration in the Life Sciences - Third International Workshop, DILS 2006, Proceedings
PublisherSpringer
Pages248-263
Number of pages16
ISBN (Print)3540365931, 9783540365938
DOIs
StatePublished - 2006
Externally publishedYes
Event3rd International Workshop on Data Integration in the Life Sciences, DILS 2006 - Hinxton, United Kingdom
Duration: Jul 20 2006Jul 22 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4075 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other3rd International Workshop on Data Integration in the Life Sciences, DILS 2006
Country/TerritoryUnited Kingdom
CityHinxton
Period7/20/067/22/06

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Collection-oriented scientific workflows for integrating and analyzing biological data'. Together they form a unique fingerprint.

Cite this