Abstract
Adaptive MPI is an implementation of the MPI standard that supports the virtualization of ranks as user-level threads, rather than OS processes. In this work, we optimize the communication performance of AMPI based on the locality of the endpoints communicating within a cluster of SMP nodes. We differentiate between point-to-point messages with both endpoints co-located on the same execution unit and point-to-point messages with both endpoints residing in the same process but not on the same execution unit. We demonstrate how the messaging semantics of Charm++ enable and hinder AMPI's implementation in different ways, and we motivate extensions to Charm++ to address the limitations. Using the OSU micro-benchmark suite, we show that our locality-aware design offers lower latency, higher bandwidth, and reduced memory footprint for applications.
Original language | English (US) |
---|---|
Article number | e4467 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 32 |
Issue number | 3 |
DOIs | |
State | Published - Feb 10 2020 |
Keywords
- AMPI
- MPI
- endpoints
- intra-node communication
- shared memory optimizations
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Computer Science Applications
- Computer Networks and Communications
- Computational Theory and Mathematics