Avoiding hot-spots on two-level direct networks

Abhinav Bhatele, Nikhil Jain, William D. Gropp, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. A two-level direct network has been proposed by several groups as a scalable design for future machines. IBM's PERCS topology and the dragonfly network discussed in the DARPA exascale hardware study are examples of this design. The presence of multiple levels in this design leads to hot-spots on a few links when processes are grouped together at the lowest level to minimize total communication volume. This is especially true for communication graphs with a small number of neighbors per task. Routing and mapping choices can impact the communication performance of parallel applications running on a machine with a two-level direct topology. This paper explores intelligent topology aware mappings of different communication patterns to the physical topology to identify cases that minimize link utilization. We also analyze the trade-offs between using direct and indirect routing with different mappings. We use simulations to study communication and overall performance of applications since there are no installations of two-level direct networks yet. This study raises interesting issues regarding the choice of job scheduling, routing and mapping for future machines.

Original languageEnglish (US)
Title of host publicationProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
DOIs
StatePublished - 2011
Event2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11 - Seattle, WA, United States
Duration: Nov 12 2011Nov 18 2011

Publication series

NameProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Other

Other2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
Country/TerritoryUnited States
CitySeattle, WA
Period11/12/1111/18/11

Keywords

  • Communication
  • Dragonfly network
  • Exascale
  • Mapping
  • Performance

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Avoiding hot-spots on two-level direct networks'. Together they form a unique fingerprint.

Cite this