An efficient GPU implementation and scaling for higher-order 3D stencils

Omer Anjum, Mohammad Almasri, Simon Garcia de Gonzalo, Wen mei Hwu

Research output: Contribution to journalArticlepeer-review


Stencil computation patterns are the backbone of many scientific and engineering simulations. The stencil computation is known to be constrained by its high demand of memory bandwidth, which limits performance on accelerators such as GPUs. Prior GPU-based approaches concentrated on stencils with only axis-aligned grid points with 2D caching schemes. However, for stencils with non-axis grid points, prior approaches use 3D caching or multi-pass 2D caching schemes. These methods suffer from either large number of global memory accesses or required large size of the shared memory. In this work, we present an efficient GPU implementation scheme “Scatter Without Write Conflict” (SWiC) for large advanced 3D stencil patterns involving non-axis-aligned grid points. Unlike other 3D caching schemes, SWiC only needs 2D caching and a single pass over the stencil per iteration. SWiC achieves, significant reductions in the global memory accesses, without increasing the size of shared memory. SWiC can also be applied to simple axis-aligned stencils without any performance loss. Moreover, we propose a scalable implementation for the halo region exchange of 3D stencils on multi-GPU nodes. For evaluation, we test SWiC on three Nvidia GPU generations and show that our approach significantly outperforms existing state-of-the-art GPU implementations with a speedup ranging from 1.6× to 5.75×. We also provide a detailed scaling analysis in multi-node and multi-GPU environments.

Original languageEnglish (US)
Pages (from-to)326-343
Number of pages18
JournalInformation Sciences
StatePublished - Mar 2022


  • Finite difference solver
  • Fluid dynamics
  • GPU
  • High-order 3D stencil
  • MHD
  • Register blocking
  • Scaling
  • Scatter

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence


Dive into the research topics of 'An efficient GPU implementation and scaling for higher-order 3D stencils'. Together they form a unique fingerprint.

Cite this