Matrix multiplication on multidimensional torus networks

Edgar Solomonik, James Demmel

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing for Computational Science, VECPAR 2012 - 10th International Conference, Revised Selected Papers
Number of pages15
StatePublished - 2013
Externally publishedYes
Event10th International Conference on High Performance Computing for Computational Science, VECPAR 2012 - Kobe, Japan
Duration: Jul 17 2012Jul 20 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7851 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other10th International Conference on High Performance Computing for Computational Science, VECPAR 2012

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Matrix multiplication on multidimensional torus networks'. Together they form a unique fingerprint.

Cite this