Turbine: A distributed-memory dataflow engine for high performance many-task applications

Justin M. Wozniak, Timothy G. Armstrong, Ketan Maheshwari, Ewing L. Lusk, Daniel S. Katz, Michael Wilde, Ian T. Foster

Research output: Contribution to journalArticlepeer-review

Abstract

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. 'Many-task' programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tightly coupled parallelism at the lower level through multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and intertask data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

Original languageEnglish (US)
Pages (from-to)337-366
Number of pages30
JournalFundamenta Informaticae
Volume128
Issue number3
DOIs
StatePublished - 2013
Externally publishedYes

Keywords

  • ADLB
  • MPI
  • Swift
  • Turbine
  • dataflow language

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Algebra and Number Theory
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Turbine: A distributed-memory dataflow engine for high performance many-task applications'. Together they form a unique fingerprint.

Cite this