Abstract
Known algorithms for two important collective communication operations, allgather and reduce-scatter, are minimal-communication algorithms; no process sends or receives more than the minimum amount of data. This, combined with the data-ordering semantics of the operations, limits the flexibility and performance of these algorithms. Our novel non-minimal, topology-aware algorithms deliver far better performance with the addition of a very small amount of redundant communication. We develop novel algorithms for Clos networks and single or multi-ported torus networks. Tests on a 32k-node BlueGene/P result in allgather speedups of up to 6x and reduce-scatter speedups of over 11x compared to the native IBM algorithm. Broadcast, reduce, and allreduce can be composed of allgather or reduce-scatter and other collective operations; our techniques also improve the performance of these algorithms.
Original language | English (US) |
---|---|
Pages (from-to) | 45-54 |
Number of pages | 10 |
Journal | ACM SIGPLAN Notices |
Volume | 47 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2012 |
Keywords
- Collective-communication algorithms
ASJC Scopus subject areas
- General Computer Science