Evaluating distributed execution of workloads

Matteo Turilli, Yadu Nand Babuji, Andre Merzky, Ming Tai Ha, Michael Wilde, Daniel S. Katz, Shantenu Jha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: The AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems.

Original languageEnglish (US)
Title of host publicationProceedings - 13th IEEE International Conference on eScience, eScience 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages276-285
Number of pages10
ISBN (Electronic)9781538626863
DOIs
StatePublished - Nov 14 2017
Event13th IEEE International Conference on eScience, eScience 2017 - Auckland, New Zealand
Duration: Oct 24 2017Oct 27 2017

Publication series

NameProceedings - 13th IEEE International Conference on eScience, eScience 2017

Other

Other13th IEEE International Conference on eScience, eScience 2017
Country/TerritoryNew Zealand
CityAuckland
Period10/24/1710/27/17

Keywords

  • distributed cyberinfrastructure
  • distributed execution
  • workflow systems

ASJC Scopus subject areas

  • Agricultural and Biological Sciences (miscellaneous)
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computer Networks and Communications
  • Computer Science Applications
  • Computers in Earth Sciences
  • Social Sciences (miscellaneous)

Fingerprint

Dive into the research topics of 'Evaluating distributed execution of workloads'. Together they form a unique fingerprint.

Cite this