Measuring congestion in high-performance datacenter interconnects

Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Benjamin Lim, Mike Showerman, Greg Bauer, Larry Kaplan, Zbigniew Kalbarczyk, William Kramer, Ravi Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While it is widely acknowledged that network congestion in High Performance Computing (HPC) systems can significantly degrade application performance, there has been little to no quantification of congestion on credit-based interconnect networks. We present a methodology for detecting, extracting, and characterizing regions of congestion in networks. We have implemented the methodology in a deployable tool, Monet, which can provide such analysis and feedback at runtime. Using Monet, we characterize and diagnose congestion in the world's largest 3D torus network of Blue Waters, a 13.3-petaflop supercomputer at the National Center for Supercomputing Applications. Our study deepens the understanding of production congestion at a scale that has never been evaluated before.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020
PublisherUSENIX Association
Pages37-57
Number of pages21
ISBN (Electronic)9781939133137
StatePublished - 2020
Event17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020 - Santa Clara, United States
Duration: Feb 25 2020Feb 27 2020

Publication series

NameProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020

Conference

Conference17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020
Country/TerritoryUnited States
CitySanta Clara
Period2/25/202/27/20

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Measuring congestion in high-performance datacenter interconnects'. Together they form a unique fingerprint.

Cite this