TY - GEN
T1 - A framework for collective personalized communication
AU - Kalé, Laxmikant V.
AU - Kumar, Sameer
AU - Varadarajan, Krishnan
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - The paper explores collective personalized communication. For example, in all-to-all personalized communication (AAPC), each processor sends a distinct message to every other processor. However, for many applications, the collective communication pattern is many-to-many, where each processor sends a distinct message to a subset of processors. We first present strategies that reduce per-message cost to optimize AAPC. We then present performance results of these strategies in both all-to-all and many-to-many scenarios. These strategies are implemented in a flexible, asynchronous library with a non-blocking interface, and a message-driven runtime system. This allows the collective communication to run concurrently with the application, if desired. As a result the computational overhead of the communication is substantially reduced, at least on machines such as PSC Lemieux, which sport a co-processor capable of remote DMA. We demonstrate the advantages of our framework with performance results on several benchmarks and applications.
AB - The paper explores collective personalized communication. For example, in all-to-all personalized communication (AAPC), each processor sends a distinct message to every other processor. However, for many applications, the collective communication pattern is many-to-many, where each processor sends a distinct message to a subset of processors. We first present strategies that reduce per-message cost to optimize AAPC. We then present performance results of these strategies in both all-to-all and many-to-many scenarios. These strategies are implemented in a flexible, asynchronous library with a non-blocking interface, and a message-driven runtime system. This allows the collective communication to run concurrently with the application, if desired. As a result the computational overhead of the communication is substantially reduced, at least on machines such as PSC Lemieux, which sport a co-processor capable of remote DMA. We demonstrate the advantages of our framework with performance results on several benchmarks and applications.
UR - http://www.scopus.com/inward/record.url?scp=84947212732&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947212732&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2003.1213166
DO - 10.1109/IPDPS.2003.1213166
M3 - Conference contribution
AN - SCOPUS:84947212732
T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
BT - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Parallel and Distributed Processing Symposium, IPDPS 2003
Y2 - 22 April 2003 through 26 April 2003
ER -