TY - GEN

T1 - Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations

AU - Wicky, Tobias

AU - Solomonik, Edgar

AU - Hoefler, Torsten

N1 - Publisher Copyright:
© 2017 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.

PY - 2017/6/30

Y1 - 2017/6/30

N2 - We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of computation and communication in terms of words sent.

AB - We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of computation and communication in terms of words sent.

KW - 3D algorithms

KW - TRSM

KW - communication cost

UR - http://www.scopus.com/inward/record.url?scp=85027683884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027683884&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2017.104

DO - 10.1109/IPDPS.2017.104

M3 - Conference contribution

AN - SCOPUS:85027683884

T3 - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

SP - 678

EP - 687

BT - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017

Y2 - 29 May 2017 through 2 June 2017

ER -