Maximizing Throughput on a Dragonfly Network

Nikhil Jain, Abhinav Bhatele, Xiang Ni, Nicholas J. Wright, Laxmikant V Kale

Research output: Contribution to journalConference article

Abstract

Interconnection networks are a critical resource for large supercomputers. The dragonfly topology, which provides a low network diameter and large bisection bandwidth, is being explored as a promising option for building multi-Petaflop's and Exaflop's systems. Unlike the extensively studied torus networks, the best choices of message routing and job placement strategies for the dragonfly topology are not well understood. This paper aims at analyzing the behavior of a machine built using a dragonfly network for various routing strategies, job placement policies, and application communication patterns. Our study is based on a novel model that predicts traffic on individual links for direct, indirect, and adaptive routing strategies. We analyze results for individual communication patterns and some common parallel job workloads. The predictions presented in this paper are for a 100+ Petaflop's prototype machine with 92,160 high radix routers and 8.8 million cores.

Original languageEnglish (US)
Article number7013015
Pages (from-to)336-347
Number of pages12
JournalInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2015-January
Issue numberJanuary
DOIs
StatePublished - Jan 16 2014
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014 - New Orleans, United States
Duration: Nov 16 2014Nov 21 2014

Fingerprint

Throughput
Topology
Supercomputers
Communication
Routers
Bandwidth

Keywords

  • dragonfly networks
  • job placement
  • modeling
  • prediction
  • routing

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Maximizing Throughput on a Dragonfly Network. / Jain, Nikhil; Bhatele, Abhinav; Ni, Xiang; Wright, Nicholas J.; Kale, Laxmikant V.

In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Vol. 2015-January, No. January, 7013015, 16.01.2014, p. 336-347.

Research output: Contribution to journalConference article

Jain, Nikhil ; Bhatele, Abhinav ; Ni, Xiang ; Wright, Nicholas J. ; Kale, Laxmikant V. / Maximizing Throughput on a Dragonfly Network. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC. 2014 ; Vol. 2015-January, No. January. pp. 336-347.
@article{a66c9a6a2d8b4f2ba674a02c0e6fc390,
title = "Maximizing Throughput on a Dragonfly Network",
abstract = "Interconnection networks are a critical resource for large supercomputers. The dragonfly topology, which provides a low network diameter and large bisection bandwidth, is being explored as a promising option for building multi-Petaflop's and Exaflop's systems. Unlike the extensively studied torus networks, the best choices of message routing and job placement strategies for the dragonfly topology are not well understood. This paper aims at analyzing the behavior of a machine built using a dragonfly network for various routing strategies, job placement policies, and application communication patterns. Our study is based on a novel model that predicts traffic on individual links for direct, indirect, and adaptive routing strategies. We analyze results for individual communication patterns and some common parallel job workloads. The predictions presented in this paper are for a 100+ Petaflop's prototype machine with 92,160 high radix routers and 8.8 million cores.",
keywords = "dragonfly networks, job placement, modeling, prediction, routing",
author = "Nikhil Jain and Abhinav Bhatele and Xiang Ni and Wright, {Nicholas J.} and Kale, {Laxmikant V}",
year = "2014",
month = "1",
day = "16",
doi = "10.1109/SC.2014.33",
language = "English (US)",
volume = "2015-January",
pages = "336--347",
journal = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",
issn = "2167-4329",
number = "January",

}

TY - JOUR

T1 - Maximizing Throughput on a Dragonfly Network

AU - Jain, Nikhil

AU - Bhatele, Abhinav

AU - Ni, Xiang

AU - Wright, Nicholas J.

AU - Kale, Laxmikant V

PY - 2014/1/16

Y1 - 2014/1/16

N2 - Interconnection networks are a critical resource for large supercomputers. The dragonfly topology, which provides a low network diameter and large bisection bandwidth, is being explored as a promising option for building multi-Petaflop's and Exaflop's systems. Unlike the extensively studied torus networks, the best choices of message routing and job placement strategies for the dragonfly topology are not well understood. This paper aims at analyzing the behavior of a machine built using a dragonfly network for various routing strategies, job placement policies, and application communication patterns. Our study is based on a novel model that predicts traffic on individual links for direct, indirect, and adaptive routing strategies. We analyze results for individual communication patterns and some common parallel job workloads. The predictions presented in this paper are for a 100+ Petaflop's prototype machine with 92,160 high radix routers and 8.8 million cores.

AB - Interconnection networks are a critical resource for large supercomputers. The dragonfly topology, which provides a low network diameter and large bisection bandwidth, is being explored as a promising option for building multi-Petaflop's and Exaflop's systems. Unlike the extensively studied torus networks, the best choices of message routing and job placement strategies for the dragonfly topology are not well understood. This paper aims at analyzing the behavior of a machine built using a dragonfly network for various routing strategies, job placement policies, and application communication patterns. Our study is based on a novel model that predicts traffic on individual links for direct, indirect, and adaptive routing strategies. We analyze results for individual communication patterns and some common parallel job workloads. The predictions presented in this paper are for a 100+ Petaflop's prototype machine with 92,160 high radix routers and 8.8 million cores.

KW - dragonfly networks

KW - job placement

KW - modeling

KW - prediction

KW - routing

UR - http://www.scopus.com/inward/record.url?scp=84936949804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936949804&partnerID=8YFLogxK

U2 - 10.1109/SC.2014.33

DO - 10.1109/SC.2014.33

M3 - Conference article

AN - SCOPUS:84936949804

VL - 2015-January

SP - 336

EP - 347

JO - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

JF - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

SN - 2167-4329

IS - January

M1 - 7013015

ER -