TY - GEN
T1 - Communication avoiding and overlapping for numerical linear algebra
AU - Georganas, Evangelos
AU - González-Domínguez, Jorge
AU - Solomonik, Edgar
AU - Zheng, Yili
AU - Touriño, Juan
AU - Yelick, Katherine
PY - 2012
Y1 - 2012
N2 - To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication- avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix multiplication algorithms (Cannon and SUMMA), triangular solve, and Cholesky factorization. For each algorithm, we construct a detailed performance model that considers both critical path dependencies and idle time. We give novel implementations of 2.5D algorithms with overlap for each of these problems. Our software employs UPC, a partitioned global address space (PGAS) language that provides fast one-sided communication. We show communication avoidance and overlap provide a cumulative benefit as core counts scale, including results using over 24K cores of a Cray XE6 system.
AB - To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication- avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix multiplication algorithms (Cannon and SUMMA), triangular solve, and Cholesky factorization. For each algorithm, we construct a detailed performance model that considers both critical path dependencies and idle time. We give novel implementations of 2.5D algorithms with overlap for each of these problems. Our software employs UPC, a partitioned global address space (PGAS) language that provides fast one-sided communication. We show communication avoidance and overlap provide a cumulative benefit as core counts scale, including results using over 24K cores of a Cray XE6 system.
UR - http://www.scopus.com/inward/record.url?scp=84877714396&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877714396&partnerID=8YFLogxK
U2 - 10.1109/SC.2012.32
DO - 10.1109/SC.2012.32
M3 - Conference contribution
AN - SCOPUS:84877714396
SN - 9781467308069
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
T2 - 2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
Y2 - 10 November 2012 through 16 November 2012
ER -