TY - GEN
T1 - Matrix multiplication on multidimensional torus networks
AU - Solomonik, Edgar
AU - Demmel, James
PY - 2013
Y1 - 2013
N2 - Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance.
AB - Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure. This algorithm is useful for torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to efficiently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that a MPI version of SD-Cannon can exploit multiple communication links and improve performance.
UR - http://www.scopus.com/inward/record.url?scp=84883262340&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84883262340&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-38718-0_21
DO - 10.1007/978-3-642-38718-0_21
M3 - Conference contribution
AN - SCOPUS:84883262340
SN - 9783642387173
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 201
EP - 215
BT - High Performance Computing for Computational Science, VECPAR 2012 - 10th International Conference, Revised Selected Papers
T2 - 10th International Conference on High Performance Computing for Computational Science, VECPAR 2012
Y2 - 17 July 2012 through 20 July 2012
ER -