Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures

Shelby Lockhart, Amanda Bienz, William D. Gropp, Luke N. Olson

Research output: Contribution to journalArticlepeer-review


Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix–vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.

Original languageEnglish (US)
Article number103021
JournalParallel Computing
StatePublished - Jul 2023
Externally publishedYes


  • CUDA-aware
  • Communication
  • Data movement
  • GPU
  • GPUDirect
  • MPI
  • Parallel
  • Performance modeling
  • Sparse matrix

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence


Dive into the research topics of 'Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures'. Together they form a unique fingerprint.

Cite this