TY - GEN
T1 - A simple, pipelined algorithm for large, irregular all-gather problems
AU - Träff, Jesper Larsson
AU - Ripke, Andreas
AU - Siebert, Christian
AU - Balaji, Pavan
AU - Thakur, Rajeev
AU - Gropp, William
N1 - Funding Information:
This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357.
PY - 2008
Y1 - 2008
N2 - We present and evaluate a new, simple, pipelined algorithm for large, irregular all-gather problems, useful for the implementation of the MPI_Allgatherv collective operation of MPI. The algorithm can be viewed as an adaptation of a linear ring algorithm for regular all-gather problems for single-ported, clustered multiprocessors to the irregular problem. Compared to the standard ring algorithm, whose performance is dominated by the largest data size broadcast by a process (times the number of processes), the performance of the new algorithm depends only on the total amount of data over all processes. The new algorithm has been implemented within different MPI libraries. Benchmark results on NEC SX-8, Linux clusters with InfiniBand and Gigabit Ethernet, Blue Gene/P, and SiCortex systems show huge performance gains in accordance with the expected behavior.
AB - We present and evaluate a new, simple, pipelined algorithm for large, irregular all-gather problems, useful for the implementation of the MPI_Allgatherv collective operation of MPI. The algorithm can be viewed as an adaptation of a linear ring algorithm for regular all-gather problems for single-ported, clustered multiprocessors to the irregular problem. Compared to the standard ring algorithm, whose performance is dominated by the largest data size broadcast by a process (times the number of processes), the performance of the new algorithm depends only on the total amount of data over all processes. The new algorithm has been implemented within different MPI libraries. Benchmark results on NEC SX-8, Linux clusters with InfiniBand and Gigabit Ethernet, Blue Gene/P, and SiCortex systems show huge performance gains in accordance with the expected behavior.
UR - http://www.scopus.com/inward/record.url?scp=56449099822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56449099822&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-87475-1_16
DO - 10.1007/978-3-540-87475-1_16
M3 - Conference contribution
AN - SCOPUS:56449099822
SN - 3540874747
SN - 9783540874744
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 84
EP - 93
BT - Recent Advances in Parallel Virtual Machine and Message Passing Interface - 15th European PVM/MPI Users' Group Meeting, Proceedings
T2 - 15th European PVM/MPI Users' Group Meeting, EuroPVM/MPI 2008
Y2 - 7 September 2008 through 10 September 2008
ER -