TY - GEN
T1 - Minimizing MPI resource contention in multithreaded multicore environments
AU - Goodell, David
AU - Balaji, Pavan
AU - Buntinas, Darius
AU - Dózsa, Gábor
AU - Gropp, William
AU - Kumar, Sameer
AU - De Supinski, Bronis R.
AU - Thakur, Rajeev
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - With the ever-increasing numbers of cores per node in high-performance computing systems, a growing number of applications are using threads to exploit shared memory within a node and MPI across nodes. This hybrid programming model needs efficient support for multithreaded MPI communication. In this paper, we describe the optimization of one aspect of a multithreaded MPI implementation: concurrent accesses from multiple threads to various MPI objects, such as communicators, datatypes, and requests. The semantics of the creation, usage, and destruction of these objects implies, but does not strictly require, the use of reference counting to prevent memory leaks and premature object destruction. We demonstrate how a naïve multithreaded implementation of MPI object management via reference counting incurs a significant performance penalty. We then detail two solutions that we have implemented in MPICH2 to mitigate this problem almost entirely, including one based on a novel garbage collection scheme. In our performance experiments, this new scheme improved the multithreaded messaging rate by up to 31% over the naïve reference counting method.
AB - With the ever-increasing numbers of cores per node in high-performance computing systems, a growing number of applications are using threads to exploit shared memory within a node and MPI across nodes. This hybrid programming model needs efficient support for multithreaded MPI communication. In this paper, we describe the optimization of one aspect of a multithreaded MPI implementation: concurrent accesses from multiple threads to various MPI objects, such as communicators, datatypes, and requests. The semantics of the creation, usage, and destruction of these objects implies, but does not strictly require, the use of reference counting to prevent memory leaks and premature object destruction. We demonstrate how a naïve multithreaded implementation of MPI object management via reference counting incurs a significant performance penalty. We then detail two solutions that we have implemented in MPICH2 to mitigate this problem almost entirely, including one based on a novel garbage collection scheme. In our performance experiments, this new scheme improved the multithreaded messaging rate by up to 31% over the naïve reference counting method.
UR - http://www.scopus.com/inward/record.url?scp=78649491659&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649491659&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2010.11
DO - 10.1109/CLUSTER.2010.11
M3 - Conference contribution
AN - SCOPUS:78649491659
SN - 9780769542201
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 1
EP - 8
BT - Proceedings - 2010 IEEE International Conference on Cluster Computing, Cluster 2010
ER -