TY - GEN
T1 - A study of network congestion in two supercomputing high-speed interconnects
AU - Jha, Saurabh
AU - Patke, Archit
AU - Brandt, Jim
AU - Gentile, Ann
AU - Showerman, Mike
AU - Roman, Eric
AU - Kalbarczyk, Zbigniew T.
AU - Kramer, Bill
AU - Iyer, Ravishankar K.
N1 - Funding Information:
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Award Number 2015-02674. This work is partially supported by NSF CNS 13-14891, and an IBM faculty award.
Funding Information:
Sandia National Laboratories (SNL) is a multimission laboratory managed and operated by National Technology& Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
Funding Information:
This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993)and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Application.
Funding Information:
This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - Network congestion in high-speed interconnects is a major source of application runtime performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for congestion control at the network level and in application placement, mapping, and scheduling at the system-level. However, these studies are based on proxy applications and benchmarks that are not representative of field-congestion characteristics of high-speed interconnects. To address this gap, we present (a) an end-to-end framework for monitoring and analysis to support long-term field-congestion characterization studies, and (b) an empirical study of network congestion in petascale systems across two different interconnect technologies: (i) Cray Gemini, which uses a 3-D torus topology, and (ii) Cray Aries, which uses the DragonFly topology.
AB - Network congestion in high-speed interconnects is a major source of application runtime performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for congestion control at the network level and in application placement, mapping, and scheduling at the system-level. However, these studies are based on proxy applications and benchmarks that are not representative of field-congestion characteristics of high-speed interconnects. To address this gap, we present (a) an end-to-end framework for monitoring and analysis to support long-term field-congestion characterization studies, and (b) an empirical study of network congestion in petascale systems across two different interconnect technologies: (i) Cray Gemini, which uses a 3-D torus topology, and (ii) Cray Aries, which uses the DragonFly topology.
KW - Congestion
KW - Cray Aries
KW - Cray Gemini
KW - DragonFly
KW - HPC
KW - Network
KW - Torus
UR - http://www.scopus.com/inward/record.url?scp=85076149891&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076149891&partnerID=8YFLogxK
U2 - 10.1109/HOTI.2019.00024
DO - 10.1109/HOTI.2019.00024
M3 - Conference contribution
AN - SCOPUS:85076149891
T3 - Proceedings - 2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019
SP - 45
EP - 48
BT - Proceedings - 2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019
Y2 - 14 August 2019 through 16 August 2019
ER -