## Abstract

Solving sparse triangular systems of linear equations is a performance bottleneck in many methods for solving more general sparse systems. Both for direct methods and for many iterative preconditioners, it is used to solve the system or improve an approximate solution, often across many iterations. Solving triangular systems is notoriously resistant to parallelism, however, and existing parallel linear algebra packages appear to be ineffective in exploiting significant parallelism for this problem. We develop a novel parallel algorithm based on various heuristics that adapt to the structure of the matrix and extract parallelism that is unexploited by conventional methods. By analyzing and reordering operations, our algorithm can often extract parallelism even for cases where most of the nonzero matrix entries are near the diagonal. Our main parallelism strategies are: (1) identify independent rows, (2) send data earlier to achieve greater overlap, and (3) process dense off-diagonal regions in parallel. We describe the implementation of our algorithm in Charm++ and MPI and present promising experimental results on up to 512 cores of BlueGene/P, using numerous sparse matrices from real applications.

Original language | English (US) |
---|---|

Pages (from-to) | 454-470 |

Number of pages | 17 |

Journal | Parallel Computing |

Volume | 40 |

Issue number | 9 |

DOIs | |

State | Published - Oct 2014 |

## Keywords

- Distributed memory computers
- Parallel algorithms
- Sparse linear systems
- Triangular solver

## ASJC Scopus subject areas

- Computer Networks and Communications
- Software
- Hardware and Architecture
- Artificial Intelligence
- Computer Graphics and Computer-Aided Design
- Theoretical Computer Science