Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference

Nikhil Jain, Abhinav Bhatele, Xiang Ni, Todd Gamblin, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

On most supercomputers, except some torus network based systems, resource managers allocate nodes to jobs without considering the sharing of network resources by different jobs. Such network-oblivious resource allocations result in link sharing among multiple jobs that can cause significant performance variability and performance degradation for individual jobs. In this paper, we explore low-diameter networks and corresponding node allocation policies that can eliminate inter-job interference. We propose a variation to n-dimensional mesh networks called express mesh. An express mesh is denser than the corresponding mesh network, has a low diameter independent of the number of routers, and is easily partitionable. We compare structural properties and performance of express mesh with other popular low-diameter networks. We present practical node allocation policies for express mesh and fat-tree networks that not only eliminate inter-job interference and performance variability, but also improve overall performance.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages439-448
Number of pages10
ISBN (Electronic)9781538639146
DOIs
StatePublished - Jun 30 2017
Event31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 - Orlando, United States
Duration: May 29 2017Jun 2 2017

Publication series

NameProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

Other

Other31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017
CountryUnited States
CityOrlando
Period5/29/176/2/17

    Fingerprint

Keywords

  • Network topology
  • express mesh
  • inter-job interference
  • partitionability
  • simulation

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Jain, N., Bhatele, A., Ni, X., Gamblin, T., & Kale, L. V. (2017). Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference. In Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017 (pp. 439-448). [7967133] (Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2017.91