TY - GEN
T1 - Natively supporting true one-sided communication in MPI on multi-core systems with infiniband
AU - Santhanaraman, G.
AU - Balaji, P.
AU - Gopalakrishnan, K.
AU - Thakur, R.
AU - Gropp, W.
AU - Panda, D. K.
PY - 2009
Y1 - 2009
N2 - As high-end computing systems continue to grow in scale, the performance that applications can achieve on such large scale systems depends heavily on their ability to avoid explicitly synchronized communication with other processes in the system. Accordingly, several modern and legacy parallel programming models (such as MPI, UPC, Global Arrays) have provided many programming constructs that enable implicit communication using one-sided communication operations. While MPI is the most widely used communication model for scientific computing, the usage of one-sided communication is restricted; this is mainly owing to the inefficiencies in current MPI implementations that internally rely on synchronization between processes even during one-sided communication, thus losing the potential of such constructs. In our previous work, we had utilized native one-sided communication primitives offered by high-speed networks such as InfiniBand (IB) to allow for true one-sided communication in MPI. In this paper, we extend this work to natively take advantage of one-sided atomic operations on cache-coherent multi-core/multi- processor architectures while still utilizing the benefitts of networks such as IB. Specifically, we present a sophisticated hybrid design that uses locks that migrate between IB hardware atomics and multi-core CPU atomics to take advantage of both. We demonstrate the capability of our proposed design with a wide range of experiments illustrating its benefits in performance as well as its potential to avoid explicit synchronization.
AB - As high-end computing systems continue to grow in scale, the performance that applications can achieve on such large scale systems depends heavily on their ability to avoid explicitly synchronized communication with other processes in the system. Accordingly, several modern and legacy parallel programming models (such as MPI, UPC, Global Arrays) have provided many programming constructs that enable implicit communication using one-sided communication operations. While MPI is the most widely used communication model for scientific computing, the usage of one-sided communication is restricted; this is mainly owing to the inefficiencies in current MPI implementations that internally rely on synchronization between processes even during one-sided communication, thus losing the potential of such constructs. In our previous work, we had utilized native one-sided communication primitives offered by high-speed networks such as InfiniBand (IB) to allow for true one-sided communication in MPI. In this paper, we extend this work to natively take advantage of one-sided atomic operations on cache-coherent multi-core/multi- processor architectures while still utilizing the benefitts of networks such as IB. Specifically, we present a sophisticated hybrid design that uses locks that migrate between IB hardware atomics and multi-core CPU atomics to take advantage of both. We demonstrate the capability of our proposed design with a wide range of experiments illustrating its benefits in performance as well as its potential to avoid explicit synchronization.
UR - http://www.scopus.com/inward/record.url?scp=70349740809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349740809&partnerID=8YFLogxK
U2 - 10.1109/CCGRID.2009.85
DO - 10.1109/CCGRID.2009.85
M3 - Conference contribution
AN - SCOPUS:70349740809
SN - 9780769536224
T3 - 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009
SP - 380
EP - 387
BT - 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009
PB - IEEE Computer Society
T2 - 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2009
Y2 - 18 May 2009 through 21 May 2009
ER -