TY - GEN
T1 - Design and Analysis of the Network Software Stack of an Asynchronous Many-task System - The LCI parcelport of HPX
AU - Yan, Jiakun
AU - Kaiser, Hartmut
AU - Snir, Marc
N1 - The authors would like to thank Patrick Diehl and Gregor Daiss for their advice on compiling and running Octo-Tiger and the Center of Computation and Technology at Louisiana State University for providing access to its Rostam cluster. This research was supported in part by NSF grants 1908144 and 1909015. This work used the Expanse system at the San Diego Supercomputer Center through ACCESS allocation CCR130058.
This research was supported in part by NSF grants 1908144 and 1909015. This work used the Expanse system at the San Diego Supercomputer Center through ACCESS allocation CCR130058.
PY - 2023/11/12
Y1 - 2023/11/12
N2 - The HPX asynchronous many-task runtime system has been using TCP and MPI as its communication backends (parcelports). We developed a new HPX parcelport using a new communication library, the Lightweight Communication Interface (LCI) that was designed to better match the needs of systems such as HPX. We evaluate its performance with various microbenchmarks and a real-world astrophysics application, Octo-Tiger. Compared to the best configuration of the MPI parcelport, microbenchmarks show that the new LCI parcelport improves the message rate by up to 30x and decreases latencies by up to 5x. It also reduces the total execution time of Octo-Tiger by up to 1.175x compared to the best configuration of the MPI parcelport and up to 13.6x compared to the same configuration of the MPI parcelport. We discuss the performance impacts of different design choices.
AB - The HPX asynchronous many-task runtime system has been using TCP and MPI as its communication backends (parcelports). We developed a new HPX parcelport using a new communication library, the Lightweight Communication Interface (LCI) that was designed to better match the needs of systems such as HPX. We evaluate its performance with various microbenchmarks and a real-world astrophysics application, Octo-Tiger. Compared to the best configuration of the MPI parcelport, microbenchmarks show that the new LCI parcelport improves the message rate by up to 30x and decreases latencies by up to 5x. It also reduces the total execution time of Octo-Tiger by up to 1.175x compared to the best configuration of the MPI parcelport and up to 13.6x compared to the same configuration of the MPI parcelport. We discuss the performance impacts of different design choices.
KW - asynchronous many-task systems
KW - communication libraries
KW - multithreaded message passing
UR - http://www.scopus.com/inward/record.url?scp=85178152905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178152905&partnerID=8YFLogxK
U2 - 10.1145/3624062.3624598
DO - 10.1145/3624062.3624598
M3 - Conference contribution
AN - SCOPUS:85178152905
T3 - ACM International Conference Proceeding Series
SP - 1151
EP - 1161
BT - Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PB - Association for Computing Machinery
T2 - 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Y2 - 12 November 2023 through 17 November 2023
ER -