Nonblocking Epochs in MPI One-Sided Communication

Judicael A. Zounmevo, Xin Zhao, Pavan Balaji, William D Gropp, Ahmad Afsahi

Research output: Contribution to journalConference article

Abstract

The synchronization model of the MPI one-sided communication paradigm can lead to serialization and latency propagation. For instance, a process can propagate non-RMA communication-related latencies to remote peers waiting in their respective epoch-closing routines in matching epochs. In this work, we discuss six latency issues that were documented for MPI-2.0 and show how they evolved in MPI-3.0. Then, we propose entirely nonblocking RMA synchronizations that allow processes to avoid waiting even in epoch-closing routines. The proposal provides contention avoidance in communication patterns that require back to back RMA epochs. It also fixes the latency propagation issues. Moreover, it allows the MPI progress engine to orchestrate aggressive schedulings to cut down the overall completion time of sets of epochs without introducing memory consistency hazards. Our test results show noticeable performance improvements for a lower-upper matrix decomposition as well as an application pattern that performs massive atomic updates.

Original languageEnglish (US)
Article number7013026
Pages (from-to)475-486
Number of pages12
JournalInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2015-January
Issue numberJanuary
DOIs
StatePublished - Jan 16 2014
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014 - New Orleans, United States
Duration: Nov 16 2014Nov 21 2014

Fingerprint

Communication
Synchronization
Hazards
Scheduling
Engines
Decomposition
Data storage equipment

Keywords

  • MPI
  • RMA
  • latency propagation
  • nonblocking synchronizations
  • one-sided

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Nonblocking Epochs in MPI One-Sided Communication. / Zounmevo, Judicael A.; Zhao, Xin; Balaji, Pavan; Gropp, William D; Afsahi, Ahmad.

In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Vol. 2015-January, No. January, 7013026, 16.01.2014, p. 475-486.

Research output: Contribution to journalConference article

Zounmevo, Judicael A. ; Zhao, Xin ; Balaji, Pavan ; Gropp, William D ; Afsahi, Ahmad. / Nonblocking Epochs in MPI One-Sided Communication. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC. 2014 ; Vol. 2015-January, No. January. pp. 475-486.
@article{12c00777e0344ed9a5761eed8797081b,
title = "Nonblocking Epochs in MPI One-Sided Communication",
abstract = "The synchronization model of the MPI one-sided communication paradigm can lead to serialization and latency propagation. For instance, a process can propagate non-RMA communication-related latencies to remote peers waiting in their respective epoch-closing routines in matching epochs. In this work, we discuss six latency issues that were documented for MPI-2.0 and show how they evolved in MPI-3.0. Then, we propose entirely nonblocking RMA synchronizations that allow processes to avoid waiting even in epoch-closing routines. The proposal provides contention avoidance in communication patterns that require back to back RMA epochs. It also fixes the latency propagation issues. Moreover, it allows the MPI progress engine to orchestrate aggressive schedulings to cut down the overall completion time of sets of epochs without introducing memory consistency hazards. Our test results show noticeable performance improvements for a lower-upper matrix decomposition as well as an application pattern that performs massive atomic updates.",
keywords = "MPI, RMA, latency propagation, nonblocking synchronizations, one-sided",
author = "Zounmevo, {Judicael A.} and Xin Zhao and Pavan Balaji and Gropp, {William D} and Ahmad Afsahi",
year = "2014",
month = "1",
day = "16",
doi = "10.1109/SC.2014.44",
language = "English (US)",
volume = "2015-January",
pages = "475--486",
journal = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",
issn = "2167-4329",
number = "January",

}

TY - JOUR

T1 - Nonblocking Epochs in MPI One-Sided Communication

AU - Zounmevo, Judicael A.

AU - Zhao, Xin

AU - Balaji, Pavan

AU - Gropp, William D

AU - Afsahi, Ahmad

PY - 2014/1/16

Y1 - 2014/1/16

N2 - The synchronization model of the MPI one-sided communication paradigm can lead to serialization and latency propagation. For instance, a process can propagate non-RMA communication-related latencies to remote peers waiting in their respective epoch-closing routines in matching epochs. In this work, we discuss six latency issues that were documented for MPI-2.0 and show how they evolved in MPI-3.0. Then, we propose entirely nonblocking RMA synchronizations that allow processes to avoid waiting even in epoch-closing routines. The proposal provides contention avoidance in communication patterns that require back to back RMA epochs. It also fixes the latency propagation issues. Moreover, it allows the MPI progress engine to orchestrate aggressive schedulings to cut down the overall completion time of sets of epochs without introducing memory consistency hazards. Our test results show noticeable performance improvements for a lower-upper matrix decomposition as well as an application pattern that performs massive atomic updates.

AB - The synchronization model of the MPI one-sided communication paradigm can lead to serialization and latency propagation. For instance, a process can propagate non-RMA communication-related latencies to remote peers waiting in their respective epoch-closing routines in matching epochs. In this work, we discuss six latency issues that were documented for MPI-2.0 and show how they evolved in MPI-3.0. Then, we propose entirely nonblocking RMA synchronizations that allow processes to avoid waiting even in epoch-closing routines. The proposal provides contention avoidance in communication patterns that require back to back RMA epochs. It also fixes the latency propagation issues. Moreover, it allows the MPI progress engine to orchestrate aggressive schedulings to cut down the overall completion time of sets of epochs without introducing memory consistency hazards. Our test results show noticeable performance improvements for a lower-upper matrix decomposition as well as an application pattern that performs massive atomic updates.

KW - MPI

KW - RMA

KW - latency propagation

KW - nonblocking synchronizations

KW - one-sided

UR - http://www.scopus.com/inward/record.url?scp=84936929443&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936929443&partnerID=8YFLogxK

U2 - 10.1109/SC.2014.44

DO - 10.1109/SC.2014.44

M3 - Conference article

AN - SCOPUS:84936929443

VL - 2015-January

SP - 475

EP - 486

JO - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

JF - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

SN - 2167-4329

IS - January

M1 - 7013026

ER -