Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems

Jaemin Choi, David F. Richard, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becoming increasingly important in achieving high performance. Overdecomposition has been successfully adopted on traditional CPU-based systems to achieve computation-communication overlap, significantly reducing the impact of communication on application performance. However, it has been unclear whether overdecomposition can provide the same benefits on modern GPU systems. In this work, we address the challenges in achieving computation-communication overlap with overdecomposition on GPU systems using the Charm++ parallel programming system. By prioritizing communication with CUDA streams in the application and supporting asynchronous progress of GPU operations in the Charm++ runtime system, we obtain improvements in overall performance of up to 50% and 47% with proxy applications Jacobi3D and MiniMD, respectively.

Original languageEnglish (US)
Title of host publicationProceedings of ESPM2 2020
Subtitle of host publication5th International IEEE Workshop on Extreme Scale Programming Models and Middleware, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-10
Number of pages10
ISBN (Electronic)9781665422840
DOIs
StatePublished - Nov 2020
Event5th IEEE/ACM International IEEE Workshop on Extreme Scale Programming Models and Middleware, ESPM2 2020 - Virtual, Atlanta, United States
Duration: Nov 11 2020 → …

Publication series

NameProceedings of ESPM2 2020: 5th International IEEE Workshop on Extreme Scale Programming Models and Middleware, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference5th IEEE/ACM International IEEE Workshop on Extreme Scale Programming Models and Middleware, ESPM2 2020
Country/TerritoryUnited States
CityVirtual, Atlanta
Period11/11/20 → …

Keywords

  • GPU computing
  • asynchronous task-based runtime
  • computation-communication overlap
  • overde-composition

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems'. Together they form a unique fingerprint.

Cite this