A study of network congestion in two supercomputing high-speed interconnects

Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Mike Showerman, Eric Roman, Zbigniew T. Kalbarczyk, Bill Kramer, Ravishankar K. Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Network congestion in high-speed interconnects is a major source of application runtime performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for congestion control at the network level and in application placement, mapping, and scheduling at the system-level. However, these studies are based on proxy applications and benchmarks that are not representative of field-congestion characteristics of high-speed interconnects. To address this gap, we present (a) an end-to-end framework for monitoring and analysis to support long-term field-congestion characterization studies, and (b) an empirical study of network congestion in petascale systems across two different interconnect technologies: (i) Cray Gemini, which uses a 3-D torus topology, and (ii) Cray Aries, which uses the DragonFly topology.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages45-48
Number of pages4
ISBN (Electronic)9781728155258
DOIs
StatePublished - Aug 2019
Event2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019 - Santa Clara, United States
Duration: Aug 14 2019Aug 16 2019

Publication series

NameProceedings - 2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019

Conference

Conference2019 IEEE Symposium on High-Performance Interconnects, HOTI 2019
Country/TerritoryUnited States
CitySanta Clara
Period8/14/198/16/19

Keywords

  • Congestion
  • Cray Aries
  • Cray Gemini
  • DragonFly
  • HPC
  • Network
  • Torus

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'A study of network congestion in two supercomputing high-speed interconnects'. Together they form a unique fingerprint.

Cite this