TY - GEN
T1 - Communication analysis of parallel 3D FFT for flat cartesian meshes on large blue gene systems
AU - Chan, Anthony
AU - Balaji, Pavan
AU - Gropp, William
AU - Thakur, Rajeev
N1 - Funding Information:
This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. We also acknowledge IBM for allowing us to use their BG-Watson system for our experiments. Finally, we thank Joerg Schumacher for providing us his test code that allowed us to understand the scalability issues with P3DFFT on flat cartesian meshes.
PY - 2008
Y1 - 2008
N2 - Parallel 3D FFT is a commonly used numerical method in scientific computing. P3DFFT is a recently proposed implementation of parallel 3D FFT that is designed to allow scalability to massively large systems such as Blue Gene. While there has been recent work that demonstrates such scalability on regular cartesian meshes (equal length in each dimension), its performance and scalability for flat cartesian meshes (much smaller length in one dimension) is still a concern. In this paper, we perform studies on a 16-rack (16384-node) Blue Gene/L system that demonstrates that a combination of the network topology and the communication pattern of P3DFFT can result in early network saturation and consequently performance loss. We also show that remapping processes on nodes and rotating the mesh by taking the communication properties of P3DFFT into consideration, can help alleviate this problem and improve performance by up to 48% in some special cases.
AB - Parallel 3D FFT is a commonly used numerical method in scientific computing. P3DFFT is a recently proposed implementation of parallel 3D FFT that is designed to allow scalability to massively large systems such as Blue Gene. While there has been recent work that demonstrates such scalability on regular cartesian meshes (equal length in each dimension), its performance and scalability for flat cartesian meshes (much smaller length in one dimension) is still a concern. In this paper, we perform studies on a 16-rack (16384-node) Blue Gene/L system that demonstrates that a combination of the network topology and the communication pattern of P3DFFT can result in early network saturation and consequently performance loss. We also show that remapping processes on nodes and rotating the mesh by taking the communication properties of P3DFFT into consideration, can help alleviate this problem and improve performance by up to 48% in some special cases.
UR - http://www.scopus.com/inward/record.url?scp=58449124711&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58449124711&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-89894-8_32
DO - 10.1007/978-3-540-89894-8_32
M3 - Conference contribution
AN - SCOPUS:58449124711
SN - 354089893X
SN - 9783540898931
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 350
EP - 364
BT - High Performance Computing - HiPC 2008 - 15th International Conference, Proceedings
PB - Springer
T2 - 15th International Conference on High Performance Computing, HiPC 2008
Y2 - 17 December 2008 through 20 December 2008
ER -