TY - GEN
T1 - MPMD Framework for Offloading Load Balance Computation
AU - Pearce, Olga
AU - Gamblin, Todd
AU - Supinski, Bronis R.De
AU - Schulz, Martin
AU - Amato, Nancy M.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/18
Y1 - 2016/7/18
N2 - In many parallel scientific simulations, work is assigned to processors by decomposing a spatial domain consisting of mesh cells, particles, or other elements. When work per element changes, simulations can use dynamic load balance algorithms to distribute work to processors evenly. Typical SPMD simulations wait while a load balance algorithm runs on all processors, but this algorithm can itself become a bottleneck. We propose a novel approach based on two key observations: (1) application state typically changes slowly in SPMD physics simulations, so work assignments computed in the past still produce good load balance in the future, (2) we can decouple the load balance algorithm so that it runs concurrently with the application and more efficiently on a smaller number of processors. We then apply the work assignment "late", once it has been computed. We call this approach lazy load balancing. In this paper, we show that the rate of change in work distribution is slow for a Barnes-Hut benchmark and for ParaDiS, a dislocation dynamics simulation. We implement an MPMD framework to exploit this property to save resources by running a load balancing algorithm at higher parallel efficiency on a smaller number of processors. Using our framework, we explore the trade-offs of lazy load balancing and demonstrate performance improvements of up to 46%.
AB - In many parallel scientific simulations, work is assigned to processors by decomposing a spatial domain consisting of mesh cells, particles, or other elements. When work per element changes, simulations can use dynamic load balance algorithms to distribute work to processors evenly. Typical SPMD simulations wait while a load balance algorithm runs on all processors, but this algorithm can itself become a bottleneck. We propose a novel approach based on two key observations: (1) application state typically changes slowly in SPMD physics simulations, so work assignments computed in the past still produce good load balance in the future, (2) we can decouple the load balance algorithm so that it runs concurrently with the application and more efficiently on a smaller number of processors. We then apply the work assignment "late", once it has been computed. We call this approach lazy load balancing. In this paper, we show that the rate of change in work distribution is slow for a Barnes-Hut benchmark and for ParaDiS, a dislocation dynamics simulation. We implement an MPMD framework to exploit this property to save resources by running a load balancing algorithm at higher parallel efficiency on a smaller number of processors. Using our framework, we explore the trade-offs of lazy load balancing and demonstrate performance improvements of up to 46%.
KW - Graph partitioning
KW - Load balancing
KW - Parallel simulations
UR - http://www.scopus.com/inward/record.url?scp=84983335541&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983335541&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2016.16
DO - 10.1109/IPDPS.2016.16
M3 - Conference contribution
AN - SCOPUS:84983335541
T3 - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
SP - 943
EP - 952
BT - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016
Y2 - 23 May 2016 through 27 May 2016
ER -