TY - GEN
T1 - POSTER
T2 - 16th IEEE International Conference on Cluster Computing, CLUSTER 2014
AU - Gomez, Leonardo A.Bautista
AU - Balaprakash, Prasanna
AU - Bouguerra, Mohamed Slim
AU - Wild, Stefan M.
AU - Cappello, Franck
AU - Hovland, Paul D.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/26
Y1 - 2014/11/26
N2 - Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and study FTI, a multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar.
AB - Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and study FTI, a multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar.
UR - http://www.scopus.com/inward/record.url?scp=84917726748&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84917726748&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2014.6968749
DO - 10.1109/CLUSTER.2014.6968749
M3 - Conference contribution
AN - SCOPUS:84917726748
T3 - 2014 IEEE International Conference on Cluster Computing, CLUSTER 2014
SP - 278
EP - 279
BT - 2014 IEEE International Conference on Cluster Computing, CLUSTER 2014
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 September 2014 through 26 September 2014
ER -