TY - GEN
T1 - Holistic measurement-driven system assessment
AU - Jha, Saurabh
AU - Brandt, Jim
AU - Gentile, Ann
AU - Kalbarczyk, Zbigniew T
AU - Bauer, Gregory H
AU - Enos, Jeremy James
AU - Showerman, Michael
AU - Kaplan, Larry
AU - Bode, Brett
AU - Greiner, Annette
AU - Bonnie, Amanda
AU - Mason, Mike
AU - Iyer, Ravishankar K
AU - Kramer, William T
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/22
Y1 - 2017/9/22
N2 - In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.
AB - In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.
UR - http://www.scopus.com/inward/record.url?scp=85032632760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032632760&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2017.124
DO - 10.1109/CLUSTER.2017.124
M3 - Conference contribution
AN - SCOPUS:85032632760
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 797
EP - 800
BT - Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
Y2 - 5 September 2017 through 8 September 2017
ER -