TY - GEN
T1 - Enabling concurrent multithreaded MPI communication on multicore petascale systems
AU - Dózsa, Gábor
AU - Kumar, Sameer
AU - Balaji, Pavan
AU - Buntinas, Darius
AU - Goodell, David
AU - Gropp, William
AU - Ratterman, Joe
AU - Thakur, Rajeev
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high-performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine-grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI's message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.
AB - With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high-performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine-grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI's message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.
UR - http://www.scopus.com/inward/record.url?scp=78149276973&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78149276973&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15646-5_2
DO - 10.1007/978-3-642-15646-5_2
M3 - Conference contribution
AN - SCOPUS:78149276973
SN - 3642156452
SN - 9783642156458
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 11
EP - 20
BT - Recent Advances in the Message Passing Interface - 17th European MPI Users' Group Meeting, EuroMPI 2010, Proceedings
T2 - 17th European MPI Users' Group Meeting, EuroMPI 2010
Y2 - 12 September 2010 through 15 September 2010
ER -