TY - GEN
T1 - Scaling hierarchical N-body simulations on GPU clusters
AU - Jetley, Pritish
AU - Wesolowski, Lukasz
AU - Gioachin, Filippo
AU - Kalé, Laxmikant V.
AU - Quinn, Thomas R.
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation work, grain size selection through the tuning of offloaded work request sizes, and the reduction of sequential bottlenecks. The effects of various application parameters are studied and experiments done to quantify gains in performance. Our studies are carried out in the context of a production-quality parallel cosmological simulator called ChaNGa. We highlight the re-engineering of the application to make it more suitable for GPU-based environments. Finally, we present performance results from experiments on the NCSA Lincoln GPU cluster, including a note on GPU use in multistepped simulations.
AB - This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation work, grain size selection through the tuning of offloaded work request sizes, and the reduction of sequential bottlenecks. The effects of various application parameters are studied and experiments done to quantify gains in performance. Our studies are carried out in the context of a production-quality parallel cosmological simulator called ChaNGa. We highlight the re-engineering of the application to make it more suitable for GPU-based environments. Finally, we present performance results from experiments on the NCSA Lincoln GPU cluster, including a note on GPU use in multistepped simulations.
KW - Barnes-hut algorithm
KW - General purpose graphics processors
KW - N-body simulations
KW - Performance analysis
UR - https://www.scopus.com/pages/publications/78650804472
UR - https://www.scopus.com/pages/publications/78650804472#tab=citedBy
U2 - 10.1109/SC.2010.49
DO - 10.1109/SC.2010.49
M3 - Conference contribution
AN - SCOPUS:78650804472
SN - 9781424475575
T3 - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
BT - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
T2 - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
Y2 - 13 November 2010 through 19 November 2010
ER -