Communication-avoiding CholeskyQR2 for rectangular matrices

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algorithm executes over a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. We implement this algorithm, yielding a code that can achieve a factor of Θ(P1/6) less interprocessor communication on P processors than any previous parallel QR implementation. Our performance study on Intel Knights-Landing and Cray XE supercomputers demonstrates the effectiveness of this CholeskyQR2 parallelization on a large number of nodes. Specifically, relative to ScaLAPACK's QR, on 1024 nodes of Stampede2, our CholeskyQR2 implementation is faster by 2.6x-3.3x in strong scaling tests and by 1.1x-1.9x in weak scaling tests.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages89-100
Number of pages12
ISBN (Electronic)9781728112466
DOIs
StatePublished - May 2019
Event33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019 - Rio de Janeiro, Brazil
Duration: May 20 2019May 24 2019

Publication series

NameProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019

Conference

Conference33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019
CountryBrazil
CityRio de Janeiro
Period5/20/195/24/19

Fingerprint

Communication
Supercomputers
Landing
Factorization
Synchronization
Data storage equipment
Costs
Node
Scaling
Least squares
Factors
Trade-offs
Grid
Eigenvalues
Knight

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

Cite this

Hutter, E., & Solomonik, E. V. (2019). Communication-avoiding CholeskyQR2 for rectangular matrices. In Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019 (pp. 89-100). [8820981] (Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2019.00020

Communication-avoiding CholeskyQR2 for rectangular matrices. / Hutter, Edward; Solomonik, Edgar Vadimovich.

Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 89-100 8820981 (Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hutter, E & Solomonik, EV 2019, Communication-avoiding CholeskyQR2 for rectangular matrices. in Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019., 8820981, Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019, Institute of Electrical and Electronics Engineers Inc., pp. 89-100, 33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, Rio de Janeiro, Brazil, 5/20/19. https://doi.org/10.1109/IPDPS.2019.00020
Hutter E, Solomonik EV. Communication-avoiding CholeskyQR2 for rectangular matrices. In Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 89-100. 8820981. (Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019). https://doi.org/10.1109/IPDPS.2019.00020
Hutter, Edward ; Solomonik, Edgar Vadimovich. / Communication-avoiding CholeskyQR2 for rectangular matrices. Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 89-100 (Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019).
@inproceedings{6d0838f91c414409a6be9633193479d7,
title = "Communication-avoiding CholeskyQR2 for rectangular matrices",
abstract = "Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algorithm executes over a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. We implement this algorithm, yielding a code that can achieve a factor of Θ(P1/6) less interprocessor communication on P processors than any previous parallel QR implementation. Our performance study on Intel Knights-Landing and Cray XE supercomputers demonstrates the effectiveness of this CholeskyQR2 parallelization on a large number of nodes. Specifically, relative to ScaLAPACK's QR, on 1024 nodes of Stampede2, our CholeskyQR2 implementation is faster by 2.6x-3.3x in strong scaling tests and by 1.1x-1.9x in weak scaling tests.",
author = "Edward Hutter and Solomonik, {Edgar Vadimovich}",
year = "2019",
month = "5",
doi = "10.1109/IPDPS.2019.00020",
language = "English (US)",
series = "Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "89--100",
booktitle = "Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019",
address = "United States",

}

TY - GEN

T1 - Communication-avoiding CholeskyQR2 for rectangular matrices

AU - Hutter, Edward

AU - Solomonik, Edgar Vadimovich

PY - 2019/5

Y1 - 2019/5

N2 - Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algorithm executes over a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. We implement this algorithm, yielding a code that can achieve a factor of Θ(P1/6) less interprocessor communication on P processors than any previous parallel QR implementation. Our performance study on Intel Knights-Landing and Cray XE supercomputers demonstrates the effectiveness of this CholeskyQR2 parallelization on a large number of nodes. Specifically, relative to ScaLAPACK's QR, on 1024 nodes of Stampede2, our CholeskyQR2 implementation is faster by 2.6x-3.3x in strong scaling tests and by 1.1x-1.9x in weak scaling tests.

AB - Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show its effectiveness for a wide range of matrix sizes. Our algorithm executes over a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. We implement this algorithm, yielding a code that can achieve a factor of Θ(P1/6) less interprocessor communication on P processors than any previous parallel QR implementation. Our performance study on Intel Knights-Landing and Cray XE supercomputers demonstrates the effectiveness of this CholeskyQR2 parallelization on a large number of nodes. Specifically, relative to ScaLAPACK's QR, on 1024 nodes of Stampede2, our CholeskyQR2 implementation is faster by 2.6x-3.3x in strong scaling tests and by 1.1x-1.9x in weak scaling tests.

UR - http://www.scopus.com/inward/record.url?scp=85072819853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072819853&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2019.00020

DO - 10.1109/IPDPS.2019.00020

M3 - Conference contribution

AN - SCOPUS:85072819853

T3 - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019

SP - 89

EP - 100

BT - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -