Optimizing sparse matrix-matrix multiplication for the GPU

Steven Dalton, Luke Olson, Nathan Bell

Research output: Contribution to journalArticlepeer-review


Sparse matrix-matrix multiplication (SpGEMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpGEMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpGEMM operation into three highly parallel phases: expansion, sorting, and contraction, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.

Original languageEnglish (US)
Article number25
JournalACM Transactions on Mathematical Software
Issue number4
StatePublished - Oct 2015


  • GPU
  • Matrix-matrix
  • Parallel
  • Sparse

ASJC Scopus subject areas

  • Software
  • Applied Mathematics


Dive into the research topics of 'Optimizing sparse matrix-matrix multiplication for the GPU'. Together they form a unique fingerprint.

Cite this