Analysis of the tradeoffs between energy and run time for multilevel checkpointing

Prasanna Balaprakash, Leonardo A.Bautista Gomez, Mohamed Slim Bouguerra, Stefan M. Wild, Franck Cappello, Paul D. Hovland

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In high-performance computing, there is a perpetual hunt for performance and scalability. Supercomputers grow larger offering improved computational science throughput. Nevertheless, with an increase in the number of systems’ components and their interactions, the number of failures and the power consumption will increase rapidly. Energy and reliability are among the most challenging issues that need to be addressed for extreme scale computing. We develop analytical models for run time and energy usage for multilevel fault-tolerance schemes. We use these models to study the tradeoff between run time and energy in FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. Our results show that energy consumed by FTI is low and the tradeoff between the run time and energy is small. Using the analytical models, we explore the impact of various system-level parameters on run time and energy tradeoffs.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing Systems
Subtitle of host publicationPerformance Modeling, Benchmarking, and Simulation - 5th International Workshop, PMBS 2014, Revised Selected Papers
EditorsSimon D. Hammond, Stephen A. Jarvis, Steven A. Wright
PublisherSpringer
Pages249-263
Number of pages15
ISBN (Print)9783319172477
DOIs
StatePublished - 2015
Event5th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS 2014 - New Orleans, United States
Duration: Nov 16 2014Nov 16 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8966
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS 2014
Country/TerritoryUnited States
CityNew Orleans
Period11/16/1411/16/14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Analysis of the tradeoffs between energy and run time for multilevel checkpointing'. Together they form a unique fingerprint.

Cite this