Scientific workflow design 2.0: Demonstrating streaming data collections in Kepler

Lei Dou, Daniel Zinn, Timothy McPhillips, Sven Kohler, Sean Riddle, Shawn Bowers, Bertram Ludaescher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientific workflow systems are used to integrate existing software components (actors) into larger analysis pipelines to perform in silico experiments. Current approaches for handling data in nested-collection structures, as required in many scientific domains, lead to many record-management actors (shims) that make the workflow structure overly complex, and as a consequence hard to construct, evolve and maintain. By constructing and executing workflows from bioinformatics and geosciences in the Kepler system, we will demonstrate how COMAD (Collection-Oriented Modeling and Design), an extension of conventional workflow design, addresses these shortcomings. In particular, COMAD provides a hierarchical data stream model (as in XML) and a novel declarative configuration language for actors that functions as a middleware layer between the workflow's data model (streaming nested collections) and the actor's data model (base data and lists thereof). Our approach allows actor developers to focus on the internal actor processing logic oblivious to the workflow structure. Actors can then be re-used in various workflows simply by adapting actor configurations. Due to streaming nested collections and declarative configurations, COMAD workflows can usually be realized as linear data processing pipelines, which often reflect the scientific data analysis intention better than conventional designs. This linear structure not only simplifies actor insertions and deletions (workflow evolution), but also decreases the overall complexity of the workflow, reducing future effort in maintenance.

Original languageEnglish (US)
Title of host publication2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Pages1296-1299
Number of pages4
DOIs
StatePublished - Jun 6 2011
Externally publishedYes
Event2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, Germany
Duration: Apr 11 2011Apr 16 2011

Other

Other2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
CountryGermany
CityHannover
Period4/11/114/16/11

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'Scientific workflow design 2.0: Demonstrating streaming data collections in Kepler'. Together they form a unique fingerprint.

Cite this