TY - GEN
T1 - Evaluating HPC Networks via Simulation of Parallel Workloads
AU - Jain, Nikhil
AU - Bhatele, Abhinav
AU - White, Sam
AU - Gamblin, Todd
AU - Kale, Laxmikant V.
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2016/7/2
Y1 - 2016/7/2
N2 - This paper presents an evaluation and comparison of three topologies that are popular for building interconnection networks in large-scale supercomputers: Torus, fat-tree, and dragonfly. To perform this evaluation, we propose a comprehensive methodology and present a scalable packet-level network simulator, TraceR. Our methodology includes design of prototype systems that are being evaluated, use of proxy applications to determine computation and communication load, simulating individual proxy applications and multi-job workloads, and computing aggregated performance metrics. Using the proposed methodology, prototype systems based on torus, fat-tree, and dragonfly networks with up to 730K endpoints (MPI processes) executed on 46K nodes are compared in the context of multi-job workloads from capability and capacity systems. For the 180 Petaflop/s prototype systems simulated in this paper, we show that different topologies are superior in different scenarios, i.e. there is no single best topology, and the characteristics of parallel workloads determine the optimal choice.
AB - This paper presents an evaluation and comparison of three topologies that are popular for building interconnection networks in large-scale supercomputers: Torus, fat-tree, and dragonfly. To perform this evaluation, we propose a comprehensive methodology and present a scalable packet-level network simulator, TraceR. Our methodology includes design of prototype systems that are being evaluated, use of proxy applications to determine computation and communication load, simulating individual proxy applications and multi-job workloads, and computing aggregated performance metrics. Using the proposed methodology, prototype systems based on torus, fat-tree, and dragonfly networks with up to 730K endpoints (MPI processes) executed on 46K nodes are compared in the context of multi-job workloads from capability and capacity systems. For the 180 Petaflop/s prototype systems simulated in this paper, we show that different topologies are superior in different scenarios, i.e. there is no single best topology, and the characteristics of parallel workloads determine the optimal choice.
KW - Computer simulation
KW - High performance computing
KW - Multiprocessor interconnection networks
KW - Network topology
KW - Performance evaluation
UR - http://www.scopus.com/inward/record.url?scp=85017203390&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85017203390&partnerID=8YFLogxK
U2 - 10.1109/SC.2016.13
DO - 10.1109/SC.2016.13
M3 - Conference contribution
AN - SCOPUS:85017203390
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
SP - 154
EP - 165
BT - Proceedings of SC 2016
PB - IEEE Computer Society
T2 - 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016
Y2 - 13 November 2016 through 18 November 2016
ER -