TY - GEN
T1 - Hierarchical load balancing for Charm++ applications on large supercomputers
AU - Zheng, Gengbin
AU - Meneses, Esteban
AU - Bhatelé, Abhinav
AU - Kalé, Laxmikant V.
PY - 2010
Y1 - 2010
N2 - Large parallel machines with hundreds of thousands of processors are being built. Recent studies have shown that ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to yield poor load balance on very large machines. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and poor solutions of traditional distributed schemes. This is done by creating multiple levels of aggressive load balancing domains which form a tree. This hierarchical method is demonstrated within a measurementbased load balancing framework in Charm++. We present techniques to deal with scalability challenges of load balancing at very large scale. We show performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at TACC) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD with results on the Blue Gene/P machine at ANL.
AB - Large parallel machines with hundreds of thousands of processors are being built. Recent studies have shown that ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to yield poor load balance on very large machines. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and poor solutions of traditional distributed schemes. This is done by creating multiple levels of aggressive load balancing domains which form a tree. This hierarchical method is demonstrated within a measurementbased load balancing framework in Charm++. We present techniques to deal with scalability challenges of load balancing at very large scale. We show performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at TACC) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD with results on the Blue Gene/P machine at ANL.
UR - http://www.scopus.com/inward/record.url?scp=78649896816&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649896816&partnerID=8YFLogxK
U2 - 10.1109/ICPPW.2010.65
DO - 10.1109/ICPPW.2010.65
M3 - Conference contribution
AN - SCOPUS:78649896816
SN - 9780769541570
T3 - Proceedings of the International Conference on Parallel Processing Workshops
SP - 436
EP - 444
BT - Proceedings - 2010 39th International Conference on Parallel Processing Workshops, ICPPW 2010
T2 - 2010 39th International Conference on Parallel Processing Workshops, ICPPW 2010
Y2 - 13 September 2010 through 16 September 2010
ER -