TY - GEN
T1 - G-Charm
T2 - 27th ACM International Conference on Supercomputing, ICS 2013
AU - Vasudevan, R.
AU - Vadhiyar, Sathish S.
AU - Kalé, Laxmikant V.
PY - 2013
Y1 - 2013
N2 - The effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing memory transfer between CPU and GPU, increasing occupancy of GPU kernels, overlapping data transfers with computations, reducing GPU idling and kernel optimizations. Overcoming these challenges require considerable effort on the part of the application developers and most optimization strategies are often proposed and tuned specifically for individual applications. In this paper, we present G-Charm, a generic framework with an adaptive runtime system for efficient execution of message-driven parallel applications on hybrid systems. The framework is based on Charm++, a message-driven programming environment and runtime for parallel applications. The techniques in our framework include dynamic scheduling of work on CPU and GPU cores, maximizing reuse of data present in GPU memory, data management in GPU memory, and combining multiple kernels. We have presented results using our framework on Tesla S1070 and Fermi C2070 systems using three classes of applications: a highly regular and parallel 2D Jacobi solver, a regular dense matrix Cholesky factorization representing linear algebra computations with dependencies among parallel computations and highly irregular molecular dynamics simulations. With our generic framework, we obtain 1.5 to 15 times improvement over previous GPU-based implementation of Charm++. We also obtain about 14\% improvement over an implementation of Cholesky factorization with a static work-distribution scheme.
AB - The effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing memory transfer between CPU and GPU, increasing occupancy of GPU kernels, overlapping data transfers with computations, reducing GPU idling and kernel optimizations. Overcoming these challenges require considerable effort on the part of the application developers and most optimization strategies are often proposed and tuned specifically for individual applications. In this paper, we present G-Charm, a generic framework with an adaptive runtime system for efficient execution of message-driven parallel applications on hybrid systems. The framework is based on Charm++, a message-driven programming environment and runtime for parallel applications. The techniques in our framework include dynamic scheduling of work on CPU and GPU cores, maximizing reuse of data present in GPU memory, data management in GPU memory, and combining multiple kernels. We have presented results using our framework on Tesla S1070 and Fermi C2070 systems using three classes of applications: a highly regular and parallel 2D Jacobi solver, a regular dense matrix Cholesky factorization representing linear algebra computations with dependencies among parallel computations and highly irregular molecular dynamics simulations. With our generic framework, we obtain 1.5 to 15 times improvement over previous GPU-based implementation of Charm++. We also obtain about 14\% improvement over an implementation of Cholesky factorization with a static work-distribution scheme.
KW - charm++
KW - combining kernels
KW - data management
KW - gpu
KW - hybrid execution
KW - optimizations
UR - https://www.scopus.com/pages/publications/84879813075
UR - https://www.scopus.com/pages/publications/84879813075#tab=citedBy
U2 - 10.1145/2464996.2465444
DO - 10.1145/2464996.2465444
M3 - Conference contribution
AN - SCOPUS:84879813075
SN - 9781450321303
T3 - Proceedings of the International Conference on Supercomputing
SP - 349
EP - 358
BT - ICS 2013 - Proceedings of the 2013 ACM International Conference on Supercomputing
Y2 - 10 June 2013 through 14 June 2013
ER -