TY - GEN
T1 - Characterizing Supercomputer Traffic Networks Through Link-Level Analysis
AU - Jha, Saurabh
AU - Brandt, Jim
AU - Gentile, Ann
AU - Kalbarczyk, Zbigniew
AU - Iyer, Ravishankar
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/29
Y1 - 2018/10/29
N2 - We present techniques for characterizing bandwidth and congestion characteristics of supercomputer High-Speed Networks (HSN). By utilizing a link-level perspective, we gain generality over analyses which are tied to specific topologies. We illustrate these techniques using five months of a Blue Waters production dataset consisting of network utilization and congestion counters. We find that: i) execution time of the communicationheavy applications is highly correlated to network stalls observed in the network topology and increase in application runtime can be as high as 1.7x with nominal increase in stalls, ii) heterogeneity in the available link bandwidth in the network can lead to backpressure and congestion even when the network is not underprovisioned, and (iii) links connected to I/O nodes are no more likely to observe congestion during operational hours than any other link in the system.
AB - We present techniques for characterizing bandwidth and congestion characteristics of supercomputer High-Speed Networks (HSN). By utilizing a link-level perspective, we gain generality over analyses which are tied to specific topologies. We illustrate these techniques using five months of a Blue Waters production dataset consisting of network utilization and congestion counters. We find that: i) execution time of the communicationheavy applications is highly correlated to network stalls observed in the network topology and increase in application runtime can be as high as 1.7x with nominal increase in stalls, ii) heterogeneity in the available link bandwidth in the network can lead to backpressure and congestion even when the network is not underprovisioned, and (iii) links connected to I/O nodes are no more likely to observe congestion during operational hours than any other link in the system.
KW - Congestion characterization
KW - Network congestion
KW - Network congestion visualization
UR - http://www.scopus.com/inward/record.url?scp=85057253071&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057253071&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2018.00072
DO - 10.1109/CLUSTER.2018.00072
M3 - Conference contribution
AN - SCOPUS:85057253071
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 562
EP - 570
BT - Proceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
Y2 - 10 September 2018 through 13 September 2018
ER -