TY - GEN
T1 - Dynamic load balancing in GPU-based systems for a MPI program
AU - Fazenda, Alvaro Luiz
AU - Mendes, Celso L.
AU - Kale, Laxmikant V.
AU - Panetta, Jairo
AU - Rodrigues, Eduardo Rocha
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/18
Y1 - 2014/9/18
N2 - The dynamic load-balancing framework Charm++/AMPI, developed at the University of Illinois, is based on processor virtualization to allow thread migration across processors. This framework has been successfully applied to many scientific applications in the past, such as BRAMS, NAMD, ChaNGa, and others. Most of these applications use only CPUs, that is, they do not use accelerators. However, the use of GPUs to improve computational performance is quickly getting massively disseminated in the high-performance computing community. This paper aims to investigate how the same Charm++/AMPI framework can be extended to balance load in a synthetic application inspired by the BRAMS numerical forecast model, running on GPUs instead of CPUs. Many major questions involving the use of GPUs with AMPI where handled in this work, including: how to measure the GPU's load, how to use and share GPUs among user-level threads, and what results are obtained when applying the required over-decomposition technique to a GPU-accelerated program.
AB - The dynamic load-balancing framework Charm++/AMPI, developed at the University of Illinois, is based on processor virtualization to allow thread migration across processors. This framework has been successfully applied to many scientific applications in the past, such as BRAMS, NAMD, ChaNGa, and others. Most of these applications use only CPUs, that is, they do not use accelerators. However, the use of GPUs to improve computational performance is quickly getting massively disseminated in the high-performance computing community. This paper aims to investigate how the same Charm++/AMPI framework can be extended to balance load in a synthetic application inspired by the BRAMS numerical forecast model, running on GPUs instead of CPUs. Many major questions involving the use of GPUs with AMPI where handled in this work, including: how to measure the GPU's load, how to use and share GPUs among user-level threads, and what results are obtained when applying the required over-decomposition technique to a GPU-accelerated program.
KW - General-Purpose computation on Graphics Processing Units (GPGPU)
KW - Load Balancing and Sharing
UR - http://www.scopus.com/inward/record.url?scp=84908654565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908654565&partnerID=8YFLogxK
U2 - 10.1109/HPCSim.2014.6903681
DO - 10.1109/HPCSim.2014.6903681
M3 - Conference contribution
AN - SCOPUS:84908654565
T3 - Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014
SP - 154
EP - 161
BT - Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014
A2 - Smari, Waleed
A2 - Zeljkovic, Vesna
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 International Conference on High Performance Computing and Simulation, HPCS 2014
Y2 - 21 July 2014 through 25 July 2014
ER -