TY - JOUR
T1 - OpenACC acceleration of the Nek5000 spectral element code
AU - Markidis, Stefano
AU - Gong, Jing
AU - Schliephake, Michael
AU - Laure, Erwin
AU - Hart, Alistair
AU - Henty, David
AU - Heisey, Katherine
AU - Fischer, Paul
N1 - Publisher Copyright:
© The Author(s) 2015.
PY - 2015/8/28
Y1 - 2015/8/28
N2 - We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.
AB - We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.
KW - Nek5000
KW - OpenACC
KW - optimization of NekBone with OpenACC
KW - porting NekBone to GPU
UR - http://www.scopus.com/inward/record.url?scp=84938095938&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938095938&partnerID=8YFLogxK
U2 - 10.1177/1094342015576846
DO - 10.1177/1094342015576846
M3 - Article
AN - SCOPUS:84938095938
SN - 1094-3420
VL - 29
SP - 311
EP - 319
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 3
ER -