### Abstract

We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of computation and communication in terms of words sent.

Original language | English (US) |
---|---|

Title of host publication | Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 678-687 |

Number of pages | 10 |

ISBN (Electronic) | 9781538639146 |

DOIs | |

State | Published - Jun 30 2017 |

Event | 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 - Orlando, United States Duration: May 29 2017 → Jun 2 2017 |

### Publication series

Name | Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017 |
---|

### Other

Other | 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 |
---|---|

Country | United States |

City | Orlando |

Period | 5/29/17 → 6/2/17 |

### Fingerprint

### Keywords

- 3D algorithms
- TRSM
- communication cost

### ASJC Scopus subject areas

- Information Systems
- Computer Networks and Communications
- Hardware and Architecture

### Cite this

*Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017*(pp. 678-687). [7967158] (Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2017.104

**Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations.** / Wicky, Tobias; Solomonik, Edgar; Hoefler, Torsten.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017.*, 7967158, Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017, Institute of Electrical and Electronics Engineers Inc., pp. 678-687, 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, United States, 5/29/17. https://doi.org/10.1109/IPDPS.2017.104

}

TY - GEN

T1 - Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations

AU - Wicky, Tobias

AU - Solomonik, Edgar

AU - Hoefler, Torsten

PY - 2017/6/30

Y1 - 2017/6/30

N2 - We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of computation and communication in terms of words sent.

AB - We present a new parallel algorithm for solving triangular systems with multiple right hand sides (TRSM). TRSM is used extensively in numerical linear algebra computations, both to solve triangular linear systems of equations as well as to compute factorizations with triangular matrices, such as Cholesky, LU, and QR. Our algorithm achieves better theoretical scalability than known alternatives, while maintaining numerical stability, via selective use of triangular matrix inversion. We leverage the fact that triangular inversion and matrix multiplication are more parallelizable than the standard TRSM algorithm. By only inverting triangular blocks along the diagonal of the initial matrix, we generalize the usual way of TRSM computation and the full matrix inversion approach. This flexibility leads to an efficient algorithm for any ratio of the number of right hand sides to the triangular matrix dimension. We provide a detailed communication cost analysis for our algorithm as well as for the recursive triangular matrix inversion. This cost analysis makes it possible to determine optimal block sizes and processor grids a priori. Relative to the best known algorithms for TRSM, our approach can require asymptotically fewer messages, while performing optimal amounts of computation and communication in terms of words sent.

KW - 3D algorithms

KW - TRSM

KW - communication cost

UR - http://www.scopus.com/inward/record.url?scp=85027683884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027683884&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2017.104

DO - 10.1109/IPDPS.2017.104

M3 - Conference contribution

AN - SCOPUS:85027683884

T3 - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

SP - 678

EP - 687

BT - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -