TY - GEN
T1 - Performance modeling and tuning of an unstructured mesh CFD application
AU - Gropp, William D.
AU - Kaushik, Dinesh K.
AU - Keyes, David E.
AU - Smith, Barry F.
N1 - This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. This work was supported by a GAANN Fellowship from the U.S. Department of Education and by Argonne National Laboratory under contract 983572401. This work was supported by the National Science Foundation under grant ECS-9527169, by NASA under contracts NAS1-19480 and NAS1-97046, by Argonne National Laboratory under contract 982232402, and by Lawrence Livermore National Laboratory under subcontract B347882. This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. Collaborations with Lois C. McInnes, Satish Balay, W. Kyle Anderson, and Dimitri Mavriplis were critical to the work leading up to this paper. Debbie Swider's assistance with many of the ASCI platform runs is gratefully acknowledged. Computer time was supplied by Argonne National Laboratory, Lawrence Livermore National Laboratory, National Energy Research Scientific Computing Center (NERSC), Sandia National Laboratories, and SGI-Cray.
PY - 2000
Y1 - 2000
N2 - This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several large-scale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E, and Beowulf clusters. The code achieves a respectable level of performance for sparse problems, typical of scientific and engineering codes based on partial differential equations, and scales well up to thousands of processors. Since the gap between CPU speed and memory access rate is widening, the code is analyzed from a memory-centric perspective (in contrast to traditional flop-orientation) to understand its sequential and parallel performance. Performance tuning is approached on three fronts: data layouts to enhance locality of reference, algorithmic parameters, and parallel programming model. This effort was guided partly by some simple performance models developed for the sparse matrix-vector product operation.
AB - This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several large-scale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E, and Beowulf clusters. The code achieves a respectable level of performance for sparse problems, typical of scientific and engineering codes based on partial differential equations, and scales well up to thousands of processors. Since the gap between CPU speed and memory access rate is widening, the code is analyzed from a memory-centric perspective (in contrast to traditional flop-orientation) to understand its sequential and parallel performance. Performance tuning is approached on three fronts: data layouts to enhance locality of reference, algorithmic parameters, and parallel programming model. This effort was guided partly by some simple performance models developed for the sparse matrix-vector product operation.
UR - https://www.scopus.com/pages/publications/0005354058
UR - https://www.scopus.com/pages/publications/0005354058#tab=citedBy
U2 - 10.1109/SC.2000.10059
DO - 10.1109/SC.2000.10059
M3 - Conference contribution
AN - SCOPUS:0005354058
T3 - Proceedings of the International Conference on Supercomputing
BT - SC 2000 - Proceedings of the 2000 ACM/IEEE Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 2000 ACM/IEEE Conference on Supercomputing, SC 2000
Y2 - 4 November 2000 through 10 November 2000
ER -