Improving Scalability with GPU-Aware Asynchronous Tasks

Jaemin Choi, David F. Richards, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Asynchronous tasks, when created with over-decomposition, enable automatic computation-communication overlap which can substantially improve performance and scal-ability. This is not only applicable to traditional CPU-based systems, but also to modern GPU -accelerated platforms. While the ability to hide communication behind computation can be highly effective in weak scaling scenarios, performance begins to suffer with smaller problem sizes or in strong scaling due to fine-grained overheads and reduced room for overlap. In this work, we integrate G PU -aware communication into asynchronous tasks in addition to computation-communication overlap, with the goal of reducing time spent in communication and further increasing GPU utilization. We demonstrate the performance impact of our approach using a proxy application that performs the Jacobi iterative method, Jacobi3D. In addition to optimizations to minimize synchronizations between the host and GPU devices and increase the concurrency of GPU operations, we explore techniques such as kernel fusion and CUDA Graphs to mitigate fine-grained overheads at scale.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages569-578
Number of pages10
ISBN (Electronic)9781665497473
DOIs
StatePublished - 2022
Event36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 - Virtual, Online, France
Duration: May 30 2022Jun 3 2022

Publication series

NameProceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022

Conference

Conference36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
Country/TerritoryFrance
CityVirtual, Online
Period5/30/226/3/22

Keywords

  • GPU-aware communication
  • asynchronous tasks
  • computation-communication overlap
  • overdecom-position
  • scalability

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Improving Scalability with GPU-Aware Asynchronous Tasks'. Together they form a unique fingerprint.

Cite this