TY - GEN
T1 - Accelerating Messages by Avoiding Copies in an Asynchronous Task-based Programming Model
AU - Bhat, Nitin
AU - White, Sam
AU - Ramos, Evan
AU - Kale, Laxmikant V.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Task-based programming models promise improved communication performance for irregular, fine-grained, and load imbalanced applications. They do so by relaxing some of the messaging semantics of stricter models and taking advantage of those at the lower-levels of the software stack. For example, while MPI's two-sided communication model guarantees in-order delivery, requires matching sends to receives, and has the user schedule communication, task-based models generally favor the runtime system scheduling all execution based on the dependencies and message deliveries as they happen. The messaging semantics are critical to enabling high performance.In this paper, we build on previous work that added zero copy semantics to Converse/LRTS. We examine the messaging semantics of Charm++ as it relates to large message buffers, identify shortcomings, and define new communication APIs to address them. Our work enables in-place communication semantics in the context of point-to-point messaging, broadcasts, transmission of read-only variables at program startup, and for migration of chares. We showcase the performance of our new communication APIs using benchmarks for Charm++ and Adaptive MPI, which result in nearly 90% latency improvement and 2x lower peak memory usage.
AB - Task-based programming models promise improved communication performance for irregular, fine-grained, and load imbalanced applications. They do so by relaxing some of the messaging semantics of stricter models and taking advantage of those at the lower-levels of the software stack. For example, while MPI's two-sided communication model guarantees in-order delivery, requires matching sends to receives, and has the user schedule communication, task-based models generally favor the runtime system scheduling all execution based on the dependencies and message deliveries as they happen. The messaging semantics are critical to enabling high performance.In this paper, we build on previous work that added zero copy semantics to Converse/LRTS. We examine the messaging semantics of Charm++ as it relates to large message buffers, identify shortcomings, and define new communication APIs to address them. Our work enables in-place communication semantics in the context of point-to-point messaging, broadcasts, transmission of read-only variables at program startup, and for migration of chares. We showcase the performance of our new communication APIs using benchmarks for Charm++ and Adaptive MPI, which result in nearly 90% latency improvement and 2x lower peak memory usage.
KW - AMPI
KW - Asynchronous Tasking
KW - Charm++
KW - Communication Optimizations
KW - Parallel Programming
KW - RDMA
UR - http://www.scopus.com/inward/record.url?scp=85124274080&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124274080&partnerID=8YFLogxK
U2 - 10.1109/ESPM254806.2021.00007
DO - 10.1109/ESPM254806.2021.00007
M3 - Conference contribution
AN - SCOPUS:85124274080
T3 - Proceedings of ESPM2 2021: 6th International IEEE Workshop on Extreme Scale Programming Models and Middleware, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 10
EP - 19
BT - Proceedings of ESPM2 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International IEEE/ACM Workshop on Extreme Scale Programming Models and Middleware, ESPM2 2021
Y2 - 15 November 2021
ER -