TY - GEN
T1 - MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks
AU - Cappello, Franck
AU - Etiemble, Daniel
N1 - Publisher Copyright:
© 2000 IEEE.
PY - 2000
Y1 - 2000
N2 - The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.
AB - The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.
UR - http://www.scopus.com/inward/record.url?scp=85054165140&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054165140&partnerID=8YFLogxK
U2 - 10.1109/SC.2000.10001
DO - 10.1109/SC.2000.10001
M3 - Conference contribution
AN - SCOPUS:85054165140
T3 - Proceedings of the International Conference on Supercomputing
BT - SC 2000 - Proceedings of the 2000 ACM/IEEE Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 2000 ACM/IEEE Conference on Supercomputing, SC 2000
Y2 - 4 November 2000 through 10 November 2000
ER -