Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using Machine Learning Techniques

Serif Yesil, José E. Moreira, Josep Torrellas

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Recent processors have been augmented with matrix-multiply units that operate on small matrices, creating a functional unit-rich environment. These units have been successfully employed on dense matrix operations such as those found in the Basic Linear Algebra Subprograms (BLAS). In this work, we exploit these new matrix-multiply facilities to speed up Sparse Matrix Dense Matrix Multiplications (SpMM) for highly sparse matrices. SpMM is hard to optimize. The sparsity patterns lead to a highly irregular memory access behavior. Additionally, each sparse matrix has unique characteristics, making it hard to find a single SpMM strategy that works well for all sparse matrices. The addition of matrix-multiply units makes this even more challenging. In this paper, we address these challenges. First, we design Dense Dynamic Blocks (DDB), a method to utilize the new matrix units. DDB has two specialized versions: DDB-MM and DDB-HYB. DDB-MM is a strategy that only utilizes the matrix-multiply facilities. DDB-HYB is a hybrid approach that maximizes the floating-point throughput by utilizing both vector and matrix units. Furthermore, we design a prediction mechanism for identifying the best SpMM strategy for a given sparse matrix and dense matrix pair: SpMM-OPT. SpMM-OPT selects among vector unit oriented, matrix unit oriented, and hybrid strategies for the highest floating-point throughput while taking cache optimizations into account. We experiment with 440 matrices from the well-known SuiteSparse matrix collection on a POWER10 system with vector and matrix units. We show that DDB-MM and DDB-HYB can achieve a floating-point throughput of up to 1.1 and 2.5 TFLOPs/s on a POWER10 single-chip module for double-and single-precision SpMM, respectively. Our analysis also shows that SpMM-OPT effectively chooses the best SpMM strategy and can achieve an average speedup of up to 2X compared to an optimized CSR baseline.

Original languageEnglish (US)
Title of host publicationProceedings of the 36th ACM International Conference on Supercomputing, ICS 2022
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450392815
StatePublished - Jun 28 2022
Event36th ACM International Conference on Supercomputing, ICS 2022 - Virtual, Online
Duration: Jun 27 2022Jun 30 2022

Publication series

NameProceedings of the International Conference on Supercomputing


Conference36th ACM International Conference on Supercomputing, ICS 2022
CityVirtual, Online


  • Matrix-multiply assist
  • SpMM
  • Sparse matrix-matrix multiply

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using Machine Learning Techniques'. Together they form a unique fingerprint.

Cite this