Abstract
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix–vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.
Original language | English (US) |
---|---|
Article number | 103021 |
Journal | Parallel Computing |
Volume | 116 |
DOIs | |
State | Published - Jul 2023 |
Keywords
- CUDA-aware
- Communication
- Data movement
- GPU
- GPUDirect
- MPI
- Parallel
- Performance modeling
- Sparse matrix
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence