Thermal aware automated load balancing for HPC applications

Harshitha Menon, Bilge Acun, Simon Garcia De Gonzalo, Osman Sarood, Laxmikant V Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As we move towards the exascale era, power and energy have become major challenges. Some of the supercomputers draw more than 10 megawatts, leading to high energy bills. A significant portion of this energy is spent in cooling. In this paper, we propose an adaptive control system that minimizes the cooling energy by using Dynamic Voltage and Frequency Scaling to control the temperature and performing load balancing. This framework, which is a part of the adaptive runtime system, monitors the system and application characteristics and triggers mechanism to limit the temperature. It also performs load balancing whenever imbalance is detected and load balancing is beneficial. We demonstrate, using a set of applications and benchmarks, that the proposed framework can control the temperature of the cores effectively and reduce the timing penalty automatically without any support from the user.

Original languageEnglish (US)
Title of host publication2013 IEEE International Conference on Cluster Computing, CLUSTER 2013
DOIs
StatePublished - Dec 1 2013
Event15th IEEE International Conference on Cluster Computing, CLUSTER 2013 - Indianapolis, IN, United States
Duration: Sep 23 2013Sep 27 2013

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Other

Other15th IEEE International Conference on Cluster Computing, CLUSTER 2013
CountryUnited States
CityIndianapolis, IN
Period9/23/139/27/13

Keywords

  • automated
  • dvfs
  • energy consumption
  • load balancing
  • parallel applications
  • run-time system

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint Dive into the research topics of 'Thermal aware automated load balancing for HPC applications'. Together they form a unique fingerprint.

Cite this