Mitigating processor variation through dynamic load balancing

Bilge Acun, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There can be performance variation among same-model processors in large scale clusters, and supercomputers that are caused by power, and temperature variations among the processors. These variations manifest itself as frequency difference of the processors under dynamic overclocking, such as Turbo Boost. Different-model processors also create an inherent variation when used in same cluster. For some tightly coupled HPC applications even one slow processor in the critical path can slow down the whole application therefore this variation is an important problem. To mitigate the performance variation among processors, we propose a speed-aware dynamic load balancing strategy which works on both homogeneous and non-homogeneous hardware. Our main idea is to provide an estimation of the task completion time based when moving a task from one processor to another on the processor speed. We show up to 30% performance improvement using our speed-aware load balancer compared to the no load balancing case. We also show that our speed-aware balancer performs 5% better than non-speed aware counterpart.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1073-1076
Number of pages4
ISBN (Electronic)9781509021406
DOIs
StatePublished - Jul 18 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States
Duration: May 23 2016May 27 2016

Publication series

NameProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Other

Other30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
Country/TerritoryUnited States
CityChicago
Period5/23/165/27/16

Keywords

  • Frequency
  • Turbo boost
  • Variation

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Mitigating processor variation through dynamic load balancing'. Together they form a unique fingerprint.

Cite this