Convolution: An introduction to constant memory and caching

Wen mei W. Hwu, David B. Kirk, Izzat El Hajj

Research output: Chapter in Book/Report/Conference proceedingChapter


This chapter presents convolution as an important parallel computation pattern. While convolution is used in many applications such as computer vision and video processing, it also represents a general pattern that forms the basis of many parallel algorithms. We start with the concept of convolution. We then present a basic parallel convolution algorithm whose execution speed is limited by DRAM bandwidth for accessing both the input and mask elements. We then introduce the constant memory and a simple modification to the kernel and host code to practically eliminate all the DRAM accesses. This is followed by an input tiling kernel that eliminates most of the DRAM accesses for the input elements. We show that the code can be simplified with data caching in more recent devices. We then move into a two-dimensional convolution kernel along with an analysis of the effectiveness of tiling as a function of tile sizes for one- and two-dimensional convolution.

Original languageEnglish (US)
Title of host publicationProgramming Massively Parallel Processors
Subtitle of host publicationa Hands-on Approach, Fourth Edition
Number of pages21
ISBN (Electronic)9780323912310
ISBN (Print)9780323984638
StatePublished - Jan 1 2022


  • Convolution
  • constant cache
  • constant memory
  • convolution filters
  • ghost cells
  • halo cells
  • tiling
  • tiling efficiency

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Convolution: An introduction to constant memory and caching'. Together they form a unique fingerprint.

Cite this