TY - GEN
T1 - High performance fortran compilation techniques for parallelizing scientific codes
AU - Adve, Vikram
AU - Jin, Guohua
AU - Mellor-Crummey, John
AU - Yi, Qing
N1 - Publisher Copyright:
© 1998 IEEE.
PY - 1998
Y1 - 1998
N2 - With current compilers for High Performance Fortran (HPF), substantial restructuring and hand-optimization may be required to obtain acceptable performance from an HPF port of an existing Fortran application. A key goal of the Rice dHPF compiler project is to develop optimization techniques that can provide consistently high performance for a broad spectrum of scientific applications with minimal restructuring of existing Fortran 77 or Fortran 90 applications. This paper presents four new optimization techniques we developed to support efficient parallelization of codes with minimal restructuring. These optimizations include computation partition selection for loop nests that use privatizable arrays, along with partial replication of boundary computations to reduce communication overhead; communication-sensitive loop distribution to eliminate inner-loop communications; interprocedural selection of computation partitions; and data availability analysis to eliminate redundant communications. We studied the effectiveness of the dHPF compiler, which incorporates these optimizations, in parallelizing serial versions of the NAS SP and BT application benchmarks. We present experimental results comparing the performance of hand-written MPI code for the benchmarks against code generated from HPF using the dHPF compiler and the Portland Group's pghpf compiler. Using the compilation techniques described in this paper we achieve performance within 15% of hand-written MPI code on 25 processors for BT and within 33% for SP. Furthermore, these results are obtained with HPF versions of the benchmarks that were created with minimal restructuring of the serial code (modifying only approximately 5% of the code).
AB - With current compilers for High Performance Fortran (HPF), substantial restructuring and hand-optimization may be required to obtain acceptable performance from an HPF port of an existing Fortran application. A key goal of the Rice dHPF compiler project is to develop optimization techniques that can provide consistently high performance for a broad spectrum of scientific applications with minimal restructuring of existing Fortran 77 or Fortran 90 applications. This paper presents four new optimization techniques we developed to support efficient parallelization of codes with minimal restructuring. These optimizations include computation partition selection for loop nests that use privatizable arrays, along with partial replication of boundary computations to reduce communication overhead; communication-sensitive loop distribution to eliminate inner-loop communications; interprocedural selection of computation partitions; and data availability analysis to eliminate redundant communications. We studied the effectiveness of the dHPF compiler, which incorporates these optimizations, in parallelizing serial versions of the NAS SP and BT application benchmarks. We present experimental results comparing the performance of hand-written MPI code for the benchmarks against code generated from HPF using the dHPF compiler and the Portland Group's pghpf compiler. Using the compilation techniques described in this paper we achieve performance within 15% of hand-written MPI code on 25 processors for BT and within 33% for SP. Furthermore, these results are obtained with HPF versions of the benchmarks that were created with minimal restructuring of the serial code (modifying only approximately 5% of the code).
KW - HPF
KW - NAS benchmarks
KW - Parallelizing compiler
UR - http://www.scopus.com/inward/record.url?scp=33746696661&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746696661&partnerID=8YFLogxK
U2 - 10.1109/SC.1998.10034
DO - 10.1109/SC.1998.10034
M3 - Conference contribution
AN - SCOPUS:33746696661
T3 - Proceedings of the International Conference on Supercomputing
BT - SC 1998 - Proceedings of the ACM/IEEE Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 1998 ACM/IEEE Conference on Supercomputing, SC 1998
Y2 - 7 November 1998 through 13 November 1998
ER -