Abstract
In this paper, we study the all-to-all multicast operation. Strategies for all-to-all multicast need to be different for small and large messages. For small messages, the major issue is the minimization of software overhead, where as for large messages, the issue is network contention. Many modern large parallel computers use the fat-tree interconnection topology. We therefore analyze network contention on fat-tree networks and develop strategies to optimize collective multicast using known contention free communication schedules on fat-tree networks in the design of two novel strategies. We evaluate performance of these strategies with up to 256 nodes (1024 processors) on an alpha cluster. We present schemes that perform well when a contiguous chunk of nodes is not available. For large messages, many of our strategies have two times better through-put than native MPI. We also demonstrate that the software overhead of a collective operation is a small fraction of the total completion time in the presence of the communication co-processor. We therefore compare the performance of the studied strategies using both metrics (i) Completion time, and (ii) Computation overhead.
Original language | English (US) |
---|---|
Pages | 205-214 |
Number of pages | 10 |
DOIs | |
State | Published - 2004 |
Event | Proceedings - Tenth International Conference on Parallel and Distributed Systems (ICPADS 2004) - Newport Beach, CA, United States Duration: Jul 7 2004 → Jul 9 2004 |
Other
Other | Proceedings - Tenth International Conference on Parallel and Distributed Systems (ICPADS 2004) |
---|---|
Country/Territory | United States |
City | Newport Beach, CA |
Period | 7/7/04 → 7/9/04 |
ASJC Scopus subject areas
- Hardware and Architecture