Abstract

We present techniques for characterizing bandwidth and congestion characteristics of supercomputer High-Speed Networks (HSN). By utilizing a link-level perspective, we gain generality over analyses which are tied to specific topologies. We illustrate these techniques using five months of a Blue Waters production dataset consisting of network utilization and congestion counters. We find that: i) execution time of the communicationheavy applications is highly correlated to network stalls observed in the network topology and increase in application runtime can be as high as 1.7x with nominal increase in stalls, ii) heterogeneity in the available link bandwidth in the network can lead to backpressure and congestion even when the network is not underprovisioned, and (iii) links connected to I/O nodes are no more likely to observe congestion during operational hours than any other link in the system.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages562-570
Number of pages9
ISBN (Electronic)9781538683194
DOIs
StatePublished - Oct 29 2018
Event2018 IEEE International Conference on Cluster Computing, CLUSTER 2018 - Belfast, United Kingdom
Duration: Sep 10 2018Sep 13 2018

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2018-September
ISSN (Print)1552-5244

Other

Other2018 IEEE International Conference on Cluster Computing, CLUSTER 2018
Country/TerritoryUnited Kingdom
CityBelfast
Period9/10/189/13/18

Keywords

  • Congestion characterization
  • Network congestion
  • Network congestion visualization

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Characterizing Supercomputer Traffic Networks Through Link-Level Analysis'. Together they form a unique fingerprint.

Cite this