TY - GEN
T1 - Scientific workflow design with data assembly lines
AU - Zinn, Daniel
AU - Bowers, Shawn
AU - McPhillips, Timothy
AU - Ludäscher, Bertram
PY - 2009
Y1 - 2009
N2 - Despite an increasing interest in scientific workflow technologies in recent years, workflow design remains a challenging, slow, and often error-prone process, thus limiting the speed of further adoption of scientific workflows. Based on practical experience with data-driven workflows, we identify and illustrate a number of recurring scientific workflow design challenges, i.e., parameter-rich functions; data assembly, disassembly, and cohesion; conditional execution; iteration; and, more generally, workflow evolution. In conventional approaches, such challenges usually lead to the introduction of different types of "shims", i.e., intermediary workflow steps that act as adapters between otherwise incorrectly wired components. However, relying heavily on the use of shims leads to brittle (i.e., change-intolerant) workflow designs that are hard to comprehend and maintain. To this end, we present a general workflow design paradigm called virtual data assembly lines (VDAL). In this paper, we show how the VDAL approach can overcome common scientific workflow design challenges and improve workflow designs by exploiting (i) a semistructured, nested data model like XML, (ii) a flexible, statically analyzable configuration mechanism (e.g., an XQuery fragment), and (iii) an underlying virtual assembly line model that is resilient to workflow and data changes. The approach has been implemented as Kepler/COMAD, and applied to improve the design of complex, real-world workflows.
AB - Despite an increasing interest in scientific workflow technologies in recent years, workflow design remains a challenging, slow, and often error-prone process, thus limiting the speed of further adoption of scientific workflows. Based on practical experience with data-driven workflows, we identify and illustrate a number of recurring scientific workflow design challenges, i.e., parameter-rich functions; data assembly, disassembly, and cohesion; conditional execution; iteration; and, more generally, workflow evolution. In conventional approaches, such challenges usually lead to the introduction of different types of "shims", i.e., intermediary workflow steps that act as adapters between otherwise incorrectly wired components. However, relying heavily on the use of shims leads to brittle (i.e., change-intolerant) workflow designs that are hard to comprehend and maintain. To this end, we present a general workflow design paradigm called virtual data assembly lines (VDAL). In this paper, we show how the VDAL approach can overcome common scientific workflow design challenges and improve workflow designs by exploiting (i) a semistructured, nested data model like XML, (ii) a flexible, statically analyzable configuration mechanism (e.g., an XQuery fragment), and (iii) an underlying virtual assembly line model that is resilient to workflow and data changes. The approach has been implemented as Kepler/COMAD, and applied to improve the design of complex, real-world workflows.
UR - http://www.scopus.com/inward/record.url?scp=74049117535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74049117535&partnerID=8YFLogxK
U2 - 10.1145/1645164.1645178
DO - 10.1145/1645164.1645178
M3 - Conference contribution
AN - SCOPUS:74049117535
SN - 9781605587172
T3 - Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09, in Conjunction with SC 2009
BT - Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09, in Conjunction with SC 2009
T2 - 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09, in Conjunction with SC 2009
Y2 - 16 November 2009 through 16 November 2009
ER -