Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM

Charles Block, Gerasimos Gerogiannis, Charith Mendis, Ariful Azad, Josep Torrellas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sparse matrix dense matrix multiplication (SpMM) is commonly used in applications ranging from scientific computing to graph neural networks. Typically, when SpMM is executed in a distributed platform, communication costs dominate. Such costs depend on how communication is scheduled. If it is scheduled in a sparsity-unaware manner, such as with collectives, execution is often inefficient due to unnecessary data transfers. On the other hand, if communication is scheduled in a fine-grained sparsity-aware manner, communicating only the necessary data, execution can also be inefficient due to high software overhead.We observe that individual sparse matrices often contain regions that are denser and regions that are sparser. Based on this observation, we develop a model that partitions communication into sparsity-unaware and sparsity-aware components. Leveraging the partition, we develop a new algorithm that performs collective communication for the denser regions, and fine-grained, one-sided communication for the sparser regions. We call the algorithm Two-Face. We show that Two-Face attains an average speedup of 2.11x over prior work when evaluated on a 4096-core supercomputer. Additionally, Two-Face scales well with the machine size.

Original languageEnglish (US)
Title of host publicationSummer Cycle
PublisherAssociation for Computing Machinery
Pages1200-1217
Number of pages18
ISBN (Electronic)9798400703850
DOIs
StatePublished - Apr 27 2024
Event29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 - San Diego, United States
Duration: Apr 27 2024May 1 2024

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
Volume2

Conference

Conference29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024
Country/TerritoryUnited States
CitySan Diego
Period4/27/245/1/24

Keywords

  • distributed algorithms
  • high-performance computing
  • sparse matrices
  • SpMM

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM'. Together they form a unique fingerprint.

Cite this