Collective algorithms for multiported torus networks

Paul Sack, William Gropp

Research output: Contribution to journalArticlepeer-review

Abstract

Modern supercomputers with torus networks allow each node to simultaneously pass messages on all of its links. However, most collective algorithms are designed to only use one link at a time. In this work, we present novel multiported algorithms for the scatter, gather, all-gather, and reduce-scatter operations. Our algorithms can be combined to create multiported reduce, all-reduce, and broadcast algorithms. Several of these algorithms involve a new technique where we relax the MPI message-ordering constraints to achieve high performance and restore the correct ordering using an additional stage of redundant communication. According to our models, on an n-dimensional torus, our algorithms should allow for nearly a 2n-fold improvement in communication performance compared to known, single-ported torus algorithms. In practice, we have achieved nearly 6x better performance on a 32k-node 3-dimensional torus.

Original languageEnglish (US)
Article numbera12
JournalACM Transactions on Parallel Computing
Volume1
Issue number2
DOIs
StatePublished - Jan 2015

Keywords

  • Collective algorithms
  • Message-passing

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Hardware and Architecture
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Collective algorithms for multiported torus networks'. Together they form a unique fingerprint.

Cite this