Framework for scalable intra-node collective operations using shared memory

Surabhi Jain, Rashid Kaleem, Marc Gamell Balmana, Akhil Langer, Dmitry Durnov, Alexander Sannikov, Maria Garzaran

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Collective operations are used in MPI programs to express common communication patterns, collective computations, or synchronization. In many collectives, such as MPI-Allreduce, the intra-node component of the collective lies on the critical path, as the inter-node communication cannot start until the intra-node component has completed. With increasing number of core counts in each node, intra-node optimizations that leverage shared memory become more important. In this paper, we focus on the performance benefit of optimizing intra-node collectives using POSIX shared memory for synchronization and data sharing. We implement several collectives using basic primitives or steps as building blocks. Key components of our implementation include a dedicated intra- node collectives layer, careful layout of the data structures, as well as optimizations to exploit the memory hierarchy to balance parallelism and latencies of data movement. A comparison of our implementation on top of MPICH shows significant performance speedups with respect to the original MPICH implementation, MVAPICH, and OpenMPI.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages374-385
Number of pages12
ISBN (Electronic)9781538683842
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States
Duration: Nov 11 2018Nov 16 2018

Publication series

NameProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
Country/TerritoryUnited States
CityDallas
Period11/11/1811/16/18

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Framework for scalable intra-node collective operations using shared memory'. Together they form a unique fingerprint.

Cite this