Effective minimally-invasive GPU acceleration of distributed sparse Matrix factorization

Anshul Gupta, Natalia Gimelshein, Seid Koric, Steven Rennich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sparse matrix factorization, a critical algorithm in many science and engineering applications, has had difficulty leveraging the additional computational power afforded by the infusion of heterogeneous accelerators in HPC clusters. We present a minimally invasive approach to the GPU acceleration of a hybrid multifrontal solver, the Watson Sparse Matrix Package, which is already highly optimized for the CPU and exhibits leading performance on distributed architectures. The novel aspect of this work is to demonstrate techniques for achieving substantial GPU acceleration, up to 3.5x, of the sparse factorization with strategic, but contained changes to the original, CPU-only, code. Strong scaling results show that performance benefits scale to as many as 512 nodes (4096 cores) of the BlueWaters supercomputer at NCSA. The techniques presented here suggest that detailed code reorganization may not be necessary to achieve substantial acceleration from GPUs, even for complex algorithms with highly irregular compute and data access patterns, like those used for distributed sparse factorization.

Original languageEnglish (US)
Title of host publicationParallel Processing - 22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016, Proceedings
EditorsPierre-François Dutot, Denis Trystram
PublisherSpringer
Pages672-683
Number of pages12
ISBN (Print)9783319436586
DOIs
StatePublished - 2016
Event22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016 - Grenoble, France
Duration: Aug 24 2016Aug 26 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9833 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016
Country/TerritoryFrance
CityGrenoble
Period8/24/168/26/16

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Effective minimally-invasive GPU acceleration of distributed sparse Matrix factorization'. Together they form a unique fingerprint.

Cite this