TY - JOUR
T1 - Compiler-optimized simulation of large-scale applications on high performance architectures
AU - Adve, Vikram S.
AU - Bagrodia, Rajive
AU - Deelman, Ewa
AU - Sakellariou, Rizos
N1 - Funding Information:
This work was supported by DARPA/ITO under Contract N66001-97-C-8533, ‘‘End-to-End Performance Modeling of Large Heterogeneous Adaptive Parallel/Distributed Computer/Communication Systems.’’ See also the project’s web page at http://www.cs.utexas.edu/users/poems/. The work was also supported in part by the ASCI ASAP program under DOE/LLNL Subcontract B347884, and by DARPA and Rome Laboratory, Air Force Materiel Command, USAF, under Agreement F30602-96-1-0159. We thank all the members of the POEMS project for their valuable contributions. We also thank the Lawrence Livermore National Laboratory for the use of their IBM SP. This work was performed while Adve and Sakellariou were with the Computer Science Department at Rice University and Deelman was with the Computer Science Department at UCLA.
PY - 2002
Y1 - 2002
N2 - In this paper, we propose and evaluate practical, automatic techniques that exploit compiler analysis to facilitate simulation of very large message-passing systems. We use compiler techniques and a compiler-synthesized static task graph model to identify the subset of the computations whose values have no significant effect on the performance of the program, and to generate symbolic estimates of the execution times of these computations. For programs with regular computation and communication patterns, this information allows us to avoid executing or simulating large portions of the computational code during the simulation. It also allows us to avoid performing some of the message data transfers, while still simulating the message performance in detail. We have used these techniques to integrate the MPI-Sim parallel simulator at UCLA with the Rice dHPF compiler infrastructure. We evaluate the accuracy and benefits of these techniques for three standard message-passing benchmarks on a wide range of problem and system sizes. The optimized simulator has errors of less than 16% compared with direct program measurement in all the cases we studied, and typically much smaller errors. Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. These dramatic savings allow us to simulate regular message-passing programs on systems and problem sizes 10 to 100 times larger than is possible with the original simulator, or other current state-of-the-art simulators.
AB - In this paper, we propose and evaluate practical, automatic techniques that exploit compiler analysis to facilitate simulation of very large message-passing systems. We use compiler techniques and a compiler-synthesized static task graph model to identify the subset of the computations whose values have no significant effect on the performance of the program, and to generate symbolic estimates of the execution times of these computations. For programs with regular computation and communication patterns, this information allows us to avoid executing or simulating large portions of the computational code during the simulation. It also allows us to avoid performing some of the message data transfers, while still simulating the message performance in detail. We have used these techniques to integrate the MPI-Sim parallel simulator at UCLA with the Rice dHPF compiler infrastructure. We evaluate the accuracy and benefits of these techniques for three standard message-passing benchmarks on a wide range of problem and system sizes. The optimized simulator has errors of less than 16% compared with direct program measurement in all the cases we studied, and typically much smaller errors. Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. These dramatic savings allow us to simulate regular message-passing programs on systems and problem sizes 10 to 100 times larger than is possible with the original simulator, or other current state-of-the-art simulators.
KW - Parallel simulation
KW - Parallelizing compilers
KW - Performance modeling
UR - http://www.scopus.com/inward/record.url?scp=0036205781&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036205781&partnerID=8YFLogxK
U2 - 10.1006/jpdc.2001.1800
DO - 10.1006/jpdc.2001.1800
M3 - Article
AN - SCOPUS:0036205781
SN - 0743-7315
VL - 62
SP - 393
EP - 426
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 3
ER -