TY - GEN
T1 - Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations
AU - Solomonik, Edgar
AU - Carson, Erin
AU - Knight, Nicholas
AU - Demmel, James
PY - 2014
Y1 - 2014
N2 - This paper derives tradeoffs between three basic costs of a parallel algorithm: synchronization, data movement, and computational cost. These tradeoffs are lower bounds on the execution time of the algorithm which are independent of the number of processors, but dependent on the problem size. Therefore, they provide lower bounds on the parallel execution time of any algorithm computed by a system composed of any number of homogeneous components, each with associated computational, communication, and synchronization payloads. We employ a theoretical model counts the amount of work and data movement as a maximum of any execution path during the parallel computation. By considering this metric, rather than the total communication volume over the whole machine, we obtain new insights into the characteristics of parallel schedules for algorithms with non-trivial dependency structures. We also present reductions from BSP and LogP algorithms to our execution model, extending our lower bounds to these two models of parallel computation. We first develop our results for general dependency graphs and hypergraphs based on their expansion properties, then we apply the theorem to a number of specific algorithms in numerical linear algebra, namely triangular substitution, Gaussian elimination, and Krylov subspace methods. Our lower bound for LU factorization demonstrates the optimality of Tiskin's LU algorithm [26] answering an open question posed in his paper, as well as of the 2.5D LU [21] algorithm which has analogous costs. We treat the computations in a general manner by noting that the computations share a similar dependency hypergraph structure and and analyzing the communication requirements of lattice hypergraph structures.
AB - This paper derives tradeoffs between three basic costs of a parallel algorithm: synchronization, data movement, and computational cost. These tradeoffs are lower bounds on the execution time of the algorithm which are independent of the number of processors, but dependent on the problem size. Therefore, they provide lower bounds on the parallel execution time of any algorithm computed by a system composed of any number of homogeneous components, each with associated computational, communication, and synchronization payloads. We employ a theoretical model counts the amount of work and data movement as a maximum of any execution path during the parallel computation. By considering this metric, rather than the total communication volume over the whole machine, we obtain new insights into the characteristics of parallel schedules for algorithms with non-trivial dependency structures. We also present reductions from BSP and LogP algorithms to our execution model, extending our lower bounds to these two models of parallel computation. We first develop our results for general dependency graphs and hypergraphs based on their expansion properties, then we apply the theorem to a number of specific algorithms in numerical linear algebra, namely triangular substitution, Gaussian elimination, and Krylov subspace methods. Our lower bound for LU factorization demonstrates the optimality of Tiskin's LU algorithm [26] answering an open question posed in his paper, as well as of the 2.5D LU [21] algorithm which has analogous costs. We treat the computations in a general manner by noting that the computations share a similar dependency hypergraph structure and and analyzing the communication requirements of lattice hypergraph structures.
UR - http://www.scopus.com/inward/record.url?scp=84904480275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904480275&partnerID=8YFLogxK
U2 - 10.1145/2612669.2612671
DO - 10.1145/2612669.2612671
M3 - Conference contribution
AN - SCOPUS:84904480275
SN - 9781450328210
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 307
EP - 318
BT - SPAA 2014 - Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures
PB - Association for Computing Machinery
T2 - 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2014
Y2 - 23 June 2014 through 25 June 2014
ER -