TY - GEN
T1 - Software combining to mitigate multithreaded MPI contention
AU - Amer, Abdelhalim
AU - Archer, Charles
AU - Blocksome, Michael
AU - Cao, Chongxiao
AU - Chuvelev, Michael
AU - Fujita, Hajime
AU - Garzaran, Maria
AU - Guo, Yanfei
AU - Hammond, Jeff R.
AU - Iwasaki, Shintaro
AU - Raffenetti, Kenneth J.
AU - Shiryaev, Mikhail
AU - Si, Min
AU - Taura, Kenjiro
AU - Thapaliya, Sagar
AU - Balaji, Pavan
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/6/26
Y1 - 2019/6/26
N2 - Efforts to mitigate lock contention from concurrent threaded accesses to MPI have reduced contention through fine-grained locking, avoided locking altogether by offloading communication to dedicated threads, or alleviated negative side effects from contention by using better lock management protocols. The blocking nature of lock-based methods, however, wastes the asynchrony benefits of nonblocking MPI operations, and the offloading model sacrifices CPU resources and incurs unnecessary software offloading overheads under low contention. We propose new thread safety models, CSync and LockQ, based on software combining, a form of software offloading without the requirement for dedicated threads; a thread holding the lock combines work of threads that failed their lock acquisitions. We demonstrate that CSync, a direct application of software combining, improves scalability but suffers from lack of asynchrony and incurs unnecessary offloading. LockQ alleviates these shortcomings by leveraging MPI semantics to relax synchronization and reduce offloading requirements. We present the implementation, analysis, and evaluation of these models on a modern network fabric and show that LockQ outperforms most existing thread safety models in low- and high-contention regimes.
AB - Efforts to mitigate lock contention from concurrent threaded accesses to MPI have reduced contention through fine-grained locking, avoided locking altogether by offloading communication to dedicated threads, or alleviated negative side effects from contention by using better lock management protocols. The blocking nature of lock-based methods, however, wastes the asynchrony benefits of nonblocking MPI operations, and the offloading model sacrifices CPU resources and incurs unnecessary software offloading overheads under low contention. We propose new thread safety models, CSync and LockQ, based on software combining, a form of software offloading without the requirement for dedicated threads; a thread holding the lock combines work of threads that failed their lock acquisitions. We demonstrate that CSync, a direct application of software combining, improves scalability but suffers from lack of asynchrony and incurs unnecessary offloading. LockQ alleviates these shortcomings by leveraging MPI semantics to relax synchronization and reduce offloading requirements. We present the implementation, analysis, and evaluation of these models on a modern network fabric and show that LockQ outperforms most existing thread safety models in low- and high-contention regimes.
UR - http://www.scopus.com/inward/record.url?scp=85074504686&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074504686&partnerID=8YFLogxK
U2 - 10.1145/3330345.3330378
DO - 10.1145/3330345.3330378
M3 - Conference contribution
AN - SCOPUS:85074504686
T3 - Proceedings of the International Conference on Supercomputing
SP - 367
EP - 379
BT - ICS 2019 - International Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019
Y2 - 26 June 2019
ER -