Optimizing sparse matrix-matrix multiplication for the GPU

Steven Dalton, Luke Olson, Nathan Bell

Research output: Contribution to journalArticlepeer-review

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpGEMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpGEMM operation into three highly parallel phases: expansion, sorting, and contraction, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.

Original languageEnglish (US)
Article number25
JournalACM Transactions on Mathematical Software
Volume41
Issue number4
DOIs
StatePublished - Oct 2015

Keywords

  • GPU
  • Matrix-matrix
  • Parallel
  • Sparse

ASJC Scopus subject areas

  • Software
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Optimizing sparse matrix-matrix multiplication for the GPU'. Together they form a unique fingerprint.

Cite this