Dynamic scheduling for work agglomeration on heterogeneous clusters

Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions targeted at the GPU may decrease performance on the CPU. This problem is typically ameliorated by statically scheduling a fixed amount of work for agglomeration. However, determining the ideal amount of work to compose requires experimentation because it varies between architectures and problem configurations. This paper describes a novel methodology for dynamically agglomerating work units at runtime and scheduling them on accelerators. This approach is demonstrated in the context of two applications: an n-body particle simulation, which offloads particle interaction work, and a parallel dense LU solver, which relocates DGEMM kernels to the GPU. In both cases dynamic agglomeration yields comparable or better results over statically scheduling the work across a variety of system configurations.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Pages2404-2413
Number of pages10
DOIs
StatePublished - Oct 18 2012
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 - Shanghai, China
Duration: May 21 2012May 25 2012

Publication series

NameProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

Other

Other2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
CountryChina
CityShanghai
Period5/21/125/25/12

Keywords

  • CUDA
  • GPGPU
  • accelerator
  • adaptive runtime
  • agglomeration
  • dynamic scheduling

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Dynamic scheduling for work agglomeration on heterogeneous clusters'. Together they form a unique fingerprint.

Cite this