TY - GEN
T1 - Enabling Scientific Workflow Reuse through Structured Composition of Dataflow and Control-Flow
AU - Bowers, Shawn
AU - Ludäscher, Bertram
AU - Ngu, Anne H.H.
AU - Critchlow, Terence
N1 - Publisher Copyright:
© 2006 IEEE.
PY - 2006
Y1 - 2006
N2 - Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, modeling "control-flow intensive" tasks using dataflow constructs often leads to overly complicated workflows that are hard to comprehend, reuse, and maintain. We describe a generic framework, based on scientific workflow templates and frames, for embedding control-flow intensive subtasks within dataflow process networks. This approach can seamlessly handle complex control-flow without sacrificing the benefits of dataflow. We illustrate our approach with a real-world scientific workflow from the astrophysics domain, requiring remote execution and file transfer in a semi-reliable environment. For such workflows, we also describe a 3-layered architecture based on frames and templates where the top-layer consists of an overall dataflow process network, the second layer consists of a tranducer template for modeling the desired control-flow behavior, and the bottom layer consists of frames inside the template that are specialized by embedding the desired component implementation. Our approach can enable scientific workflows that are more robust (faulttolerance strategies can be defined by control-flow driven transducer templates) and at the same time more reusable, since the embedding of frames and templates yields more structured and modular workflow designs.
AB - Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, modeling "control-flow intensive" tasks using dataflow constructs often leads to overly complicated workflows that are hard to comprehend, reuse, and maintain. We describe a generic framework, based on scientific workflow templates and frames, for embedding control-flow intensive subtasks within dataflow process networks. This approach can seamlessly handle complex control-flow without sacrificing the benefits of dataflow. We illustrate our approach with a real-world scientific workflow from the astrophysics domain, requiring remote execution and file transfer in a semi-reliable environment. For such workflows, we also describe a 3-layered architecture based on frames and templates where the top-layer consists of an overall dataflow process network, the second layer consists of a tranducer template for modeling the desired control-flow behavior, and the bottom layer consists of frames inside the template that are specialized by embedding the desired component implementation. Our approach can enable scientific workflows that are more robust (faulttolerance strategies can be defined by control-flow driven transducer templates) and at the same time more reusable, since the embedding of frames and templates yields more structured and modular workflow designs.
UR - http://www.scopus.com/inward/record.url?scp=77949574990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77949574990&partnerID=8YFLogxK
U2 - 10.1109/ICDEW.2006.55
DO - 10.1109/ICDEW.2006.55
M3 - Conference contribution
AN - SCOPUS:77949574990
T3 - ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops
SP - 70
EP - 79
BT - ICDEW 2006 - Proceedings of the 22nd International Conference on Data Engineering Workshops
A2 - Barga, Roger S.
A2 - Zhou, Xiaofang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd International Conference on Data Engineering Workshops, ICDEW 2006
Y2 - 3 April 2006 through 7 April 2006
ER -