TY - JOUR
T1 - Compiler techniques for the distribution of data and computation
AU - Navarro, Angeles
AU - Zapata, Emilio
AU - Padua, David
N1 - Funding Information:
The authors would like to thank the anonymous referees for their helpful and insightful suggestions. This work was supported in part by the European Union under contract 1FD97-2103, and by the Ministry of Education of Spain under contract TIC2000-1658.
PY - 2003/6
Y1 - 2003/6
N2 - This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a Locality-Communication Graph (LCG) and the formulation of the compiler technique as a Mixed Integer Nonlinear Programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions - the decompositions - that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines.
AB - This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a Locality-Communication Graph (LCG) and the formulation of the compiler technique as a Mixed Integer Nonlinear Programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions - the decompositions - that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines.
KW - Communication pattern
KW - Load balancing
KW - Locality analysis
KW - Mixed integer nonlinear programming
KW - Parallelizing compiler
UR - http://www.scopus.com/inward/record.url?scp=0041848305&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0041848305&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2003.1206503
DO - 10.1109/TPDS.2003.1206503
M3 - Article
AN - SCOPUS:0041848305
SN - 1045-9219
VL - 14
SP - 545
EP - 562
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 6
ER -