X-CSR: Dataflow optimization for distributed XML process pipelines

Daniel Zinn, Shawn Bowersy, Timothy McPhillipsy, Bertram Ludascher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called δ-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas") of XML data collections while keeping the overall collection structure intact.We show how to optimize the execution of δ-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR 1 optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped") to downstream pipeline steps. Finally, we present evaluation results for a realworld scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as [1].

Original languageEnglish (US)
Title of host publicationProceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
Pages577-580
Number of pages4
DOIs
StatePublished - 2009
Externally publishedYes
Event25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China
Duration: Mar 29 2009Apr 2 2009

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other25th IEEE International Conference on Data Engineering, ICDE 2009
Country/TerritoryChina
CityShanghai
Period3/29/094/2/09

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'X-CSR: Dataflow optimization for distributed XML process pipelines'. Together they form a unique fingerprint.

Cite this