TY - GEN
T1 - A hierarchical approach for load balancing on parallel multi-core systems
AU - Pilla, Laércio L.
AU - Ribeiro, Christiane Pousa
AU - Cordeiro, Daniel
AU - Mei, Chao
AU - Bhatele, Abhinav
AU - Navaux, Philippe O.A.
AU - Broquedis, François
AU - Méhaut, Jean François
AU - Kale, Laxmikant V.
PY - 2012
Y1 - 2012
N2 - Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the memory access costs within a compute node are also asymmetric. Ignoring this can lead to an increase in the data movement costs. Therefore, to fully exploit the potential of these nodes and reduce data access costs, it becomes crucial to have a complete view of the machine topology (i.e. the compute node topology and the interconnection network among the nodes). Furthermore, the parallel application behavior has an important role in determining how to utilize the machine efficiently. In this paper, we propose a hierarchical load balancing approach to improve the performance of applications on parallel multi-core systems. We introduce NucoLB, a topology-aware load balancer that focuses on redistributing work while reducing communication costs among and within compute nodes. NucoLB takes the asymmetric memory access costs present on NUMA multi-core compute nodes, the interconnection network overheads, and the application communication patterns into account in its balancing decisions. We have implemented NucoLB using the Charm++ parallel runtime system and evaluated its performance. Results show that our load balancer improves performance up to 20% when compared to state-of-the-art load balancers on three different NUMA parallel machines.
AB - Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the memory access costs within a compute node are also asymmetric. Ignoring this can lead to an increase in the data movement costs. Therefore, to fully exploit the potential of these nodes and reduce data access costs, it becomes crucial to have a complete view of the machine topology (i.e. the compute node topology and the interconnection network among the nodes). Furthermore, the parallel application behavior has an important role in determining how to utilize the machine efficiently. In this paper, we propose a hierarchical load balancing approach to improve the performance of applications on parallel multi-core systems. We introduce NucoLB, a topology-aware load balancer that focuses on redistributing work while reducing communication costs among and within compute nodes. NucoLB takes the asymmetric memory access costs present on NUMA multi-core compute nodes, the interconnection network overheads, and the application communication patterns into account in its balancing decisions. We have implemented NucoLB using the Charm++ parallel runtime system and evaluated its performance. Results show that our load balancer improves performance up to 20% when compared to state-of-the-art load balancers on three different NUMA parallel machines.
KW - cluster
KW - load balancing
KW - memory affinity
KW - multi-core
KW - non-uniform memory access
KW - topology
UR - https://www.scopus.com/pages/publications/84871147217
UR - https://www.scopus.com/pages/publications/84871147217#tab=citedBy
U2 - 10.1109/ICPP.2012.9
DO - 10.1109/ICPP.2012.9
M3 - Conference contribution
AN - SCOPUS:84871147217
SN - 9780769547961
T3 - Proceedings of the International Conference on Parallel Processing
SP - 118
EP - 127
BT - Proceedings - 41st International Conference on Parallel Processing, ICPP 2012
T2 - 41st International Conference on Parallel Processing, ICPP 2012
Y2 - 10 September 2012 through 13 September 2012
ER -