Enabling concurrent multithreaded MPI communication on multicore petascale systems

Gábor Dózsa, Sameer Kumar, Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Joe Ratterman, Rajeev Thakur

Research output: Chapter in Book/Report/Conference proceedingConference contribution


With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high-performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine-grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI's message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.

Original languageEnglish (US)
Title of host publicationRecent Advances in the Message Passing Interface - 17th European MPI Users' Group Meeting, EuroMPI 2010, Proceedings
Number of pages10
StatePublished - 2010
Event17th European MPI Users' Group Meeting, EuroMPI 2010 - Stuttgart, Germany
Duration: Sep 12 2010Sep 15 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6305 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other17th European MPI Users' Group Meeting, EuroMPI 2010

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Enabling concurrent multithreaded MPI communication on multicore petascale systems'. Together they form a unique fingerprint.

Cite this