Power, Reliability, and Performance: One System to Rule them All

Bilge Acun, Akhil Langer, Esteban Meneses, Harshitha Menon, Osman Sarood, Ehsan Totoni, Laxmikant V. Kalé

Research output: Contribution to specialist publicationArticle

Abstract

In a design based on the Charm++ parallel programming framework, an adaptive runtime system dynamically interacts with a datacenter's resource manager to control power by intelligently scheduling jobs, reallocating resources, and reconfiguring hardware. It simultaneously manages reliability by cooling the system to the running application's optimal level and maintains performance through load balancing.

Original languageEnglish (US)
Pages30-37
Number of pages8
Volume49
No10
Specialist publicationComputer
DOIs
StatePublished - Oct 2016

Keywords

  • Charm++
  • HPC
  • energy-aware systems
  • energy-efficient computing
  • high-performance computing
  • parallel systems
  • power management
  • temperature-aware design

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Power, Reliability, and Performance: One System to Rule them All'. Together they form a unique fingerprint.

Cite this