TY - JOUR
T1 - Preparing MPICH for exascale
AU - Guo, Yanfei
AU - Raffenetti, Ken
AU - Zhou, Hui
AU - Balaji, Pavan
AU - Si, Min
AU - Amer, Abdelhalim
AU - Iwasaki, Shintaro
AU - Seo, Sangmin
AU - Congiu, Giuseppe
AU - Latham, Robert
AU - Oden, Lena
AU - Gillis, Thomas
AU - Zambre, Rohit
AU - Ouyang, Kaiming
AU - Archer, Charles
AU - Bland, Wesley
AU - Jose, Jithin
AU - Sur, Sayantan
AU - Fujita, Hajime
AU - Durnov, Dmitry
AU - Chuvelev, Michael
AU - Zheng, Gengbin
AU - Brooks, Alex
AU - Thapaliya, Sagar
AU - Doodi, Taru
AU - Garazan, Maria
AU - Oyanagi, Steve
AU - Snir, Marc
AU - Thakur, Rajeev
N1 - This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, and by the U.S. Department of Energy, Office of Science, under Contract DE-AC02-06CH11357. This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. We gratefully acknowledge the computing resources provided by the Laboratory Computing Resource Center (LCRC) and the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory. This research also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We also thank other developers and users for their code contributions and experience reports. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Exascale Computing Project (17-SC-20-SC), U.S. Department of Energy, Office of Science (DE-AC02-06CH11357), U.S. DOE Office of Science Advanced Scientific Computing Research Program (DE-AC02-06CH11357), U.S. Department of Energy (DE-AC05-00OR22725).
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Exascale Computing Project (17-SC-20-SC), U.S. Department of Energy, Office of Science (DE-AC02-06CH11357), U.S. DOE Office of Science Advanced Scientific Computing Research Program (DE-AC02-06CH11357), U.S. Department of Energy (DE-AC05-00OR22725).
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, and by the U.S. Department of Energy, Office of Science, under Contract DE-AC02-06CH11357. This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. We gratefully acknowledge the computing resources provided by the Laboratory Computing Resource Center (LCRC) and the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory. This research also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We also thank other developers and users for their code contributions and experience reports.
PY - 2025/3
Y1 - 2025/3
N2 - The advent of exascale supercomputers heralds a new era of scientific discovery, yet it introduces significant architectural challenges that must be overcome for MPI applications to fully exploit its potential. Among these challenges is the adoption of heterogeneous architectures, particularly the integration of GPUs to accelerate computation. Additionally, the complexity of multithreaded programming models has also become a critical factor in achieving performance at scale. The efficient utilization of hardware acceleration for communication, provided by modern NICs, is also essential for achieving low latency and high throughput communication in such complex systems. In response to these challenges, the MPICH library, a high-performance and widely used Message Passing Interface (MPI) implementation, has undergone significant enhancements. This paper presents four major contributions that prepare MPICH for the exascale transition. First, we describe a lightweight communication stack that leverages the advanced features of modern NICs to maximize hardware acceleration. Second, our work showcases a highly scalable multithreaded communication model that addresses the complexities of concurrent environments. Third, we introduce GPU-aware communication capabilities that optimize data movement in GPU-integrated systems. Finally, we present a new datatype engine aimed at accelerating the use of MPI derived datatypes on GPUs. These improvements in the MPICH library not only address the immediate needs of exascale computing architectures but also set a foundation for exploiting future innovations in high-performance computing. By embracing these new designs and approaches, MPICH-derived libraries from HPE Cray and Intel were able to achieve real exascale performance on OLCF Frontier and ALCF Aurora respectively.
AB - The advent of exascale supercomputers heralds a new era of scientific discovery, yet it introduces significant architectural challenges that must be overcome for MPI applications to fully exploit its potential. Among these challenges is the adoption of heterogeneous architectures, particularly the integration of GPUs to accelerate computation. Additionally, the complexity of multithreaded programming models has also become a critical factor in achieving performance at scale. The efficient utilization of hardware acceleration for communication, provided by modern NICs, is also essential for achieving low latency and high throughput communication in such complex systems. In response to these challenges, the MPICH library, a high-performance and widely used Message Passing Interface (MPI) implementation, has undergone significant enhancements. This paper presents four major contributions that prepare MPICH for the exascale transition. First, we describe a lightweight communication stack that leverages the advanced features of modern NICs to maximize hardware acceleration. Second, our work showcases a highly scalable multithreaded communication model that addresses the complexities of concurrent environments. Third, we introduce GPU-aware communication capabilities that optimize data movement in GPU-integrated systems. Finally, we present a new datatype engine aimed at accelerating the use of MPI derived datatypes on GPUs. These improvements in the MPICH library not only address the immediate needs of exascale computing architectures but also set a foundation for exploiting future innovations in high-performance computing. By embracing these new designs and approaches, MPICH-derived libraries from HPE Cray and Intel were able to achieve real exascale performance on OLCF Frontier and ALCF Aurora respectively.
KW - HPC communication
KW - HPC network
KW - MPI
KW - Message passing interface
KW - exascale MPI
UR - http://www.scopus.com/inward/record.url?scp=105001587661&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105001587661&partnerID=8YFLogxK
U2 - 10.1177/10943420241311608
DO - 10.1177/10943420241311608
M3 - Article
AN - SCOPUS:105001587661
SN - 1094-3420
VL - 39
SP - 283
EP - 305
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 2
ER -