TY - GEN
T1 - Optimizing Distributed Tensor Contractions Using Node-Aware Processor Grids
AU - Irmler, Andreas
AU - Kanakagiri, Raghavendra
AU - Ohlmann, Sebastian T.
AU - Solomonik, Edgar
AU - Grüneis, Andreas
N1 - Andreas Irmler and Andreas Grüneis acknowledge support from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 951786 (The NOMAD CoE). Raghavendra Kanakagiri and Edgar Solomonik received support from the US NSF OAC SSI program, via award No. 1931258. The authors acknowledge application support and computing time of the MPCDF.
Instructions to install and reproduce our results are available in a Figshare repository [11]. A full list of matrix dimensions, as well as the corresponding raw data, can be found there. Andreas Irmler and Andreas Grüneis acknowledge support from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 951786 (The NOMAD CoE). Raghavendra Kanakagiri and Edgar Solomonik received support from the US NSF OAC SSI program, via award No. 1931258. The authors acknowledge application support and computing time of the MPCDF.
PY - 2023
Y1 - 2023
N2 - We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes on modern multi-core compute nodes. The key idea is to define processor grids that optimize intra-/inter-node communication volume in the employed contraction algorithms. We present an implementation of the proposed node-aware communication algorithm into the Cyclops Tensor Framework (CTF). We demonstrate that this implementation achieves a significantly improved performance for matrix-matrix-multiplication and tensor-contractions on up to several hundreds modern compute nodes compared to conventional implementations without using node-aware processor grids. Our implementation shows good performance when compared with existing state-of-the-art parallel matrix multiplication libraries (COSMA and ScaLAPACK). In addition to the discussion of the performance for matrix-matrix-multiplication, we also investigate the performance of our node-aware communication algorithm for tensor contractions as they occur in quantum chemical coupled-cluster methods. To this end we employ a modified version of CTF in combination with a coupled-cluster code (Cc4s). Our findings show that the node-aware communication algorithm is also able to improve the performance of coupled-cluster theory calculations for real-world problems running on tens to hundreds of compute nodes.
AB - We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes on modern multi-core compute nodes. The key idea is to define processor grids that optimize intra-/inter-node communication volume in the employed contraction algorithms. We present an implementation of the proposed node-aware communication algorithm into the Cyclops Tensor Framework (CTF). We demonstrate that this implementation achieves a significantly improved performance for matrix-matrix-multiplication and tensor-contractions on up to several hundreds modern compute nodes compared to conventional implementations without using node-aware processor grids. Our implementation shows good performance when compared with existing state-of-the-art parallel matrix multiplication libraries (COSMA and ScaLAPACK). In addition to the discussion of the performance for matrix-matrix-multiplication, we also investigate the performance of our node-aware communication algorithm for tensor contractions as they occur in quantum chemical coupled-cluster methods. To this end we employ a modified version of CTF in combination with a coupled-cluster code (Cc4s). Our findings show that the node-aware communication algorithm is also able to improve the performance of coupled-cluster theory calculations for real-world problems running on tens to hundreds of compute nodes.
UR - http://www.scopus.com/inward/record.url?scp=85171565978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85171565978&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-39698-4_48
DO - 10.1007/978-3-031-39698-4_48
M3 - Conference contribution
AN - SCOPUS:85171565978
SN - 9783031396977
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 710
EP - 724
BT - Euro-Par 2023
A2 - Cano, José
A2 - Dikaiakos, Marios D.
A2 - Papadopoulos, George A.
A2 - Pericàs, Miquel
A2 - Sakellariou, Rizos
PB - Springer
T2 - 29th International European Conference on Parallel and Distributed Computing, Euro-Par 2023
Y2 - 28 August 2023 through 1 September 2023
ER -