Holistic measurement-driven system assessment

Saurabh Jha, Jim Brandt, Ann Gentile, Zbigniew T Kalbarczyk, Gregory H Bauer, Jeremy James Enos, Michael Showerman, Larry Kaplan, Brett Bode, Annette Greiner, Amanda Bonnie, Mike Mason, Ravishankar K Iyer, William T Kramer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages797-800
Number of pages4
ISBN (Electronic)9781538623268
DOIs
StatePublished - Sep 22 2017
Event2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 - Honolulu, United States
Duration: Sep 5 2017Sep 8 2017

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2017-September
ISSN (Print)1552-5244

Other

Other2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
CountryUnited States
CityHonolulu
Period9/5/179/8/17

Fingerprint

Throughput
Hardware
Monitoring

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Cite this

Jha, S., Brandt, J., Gentile, A., Kalbarczyk, Z. T., Bauer, G. H., Enos, J. J., ... Kramer, W. T. (2017). Holistic measurement-driven system assessment. In Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 (pp. 797-800). [8049019] (Proceedings - IEEE International Conference on Cluster Computing, ICCC; Vol. 2017-September). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CLUSTER.2017.124

Holistic measurement-driven system assessment. / Jha, Saurabh; Brandt, Jim; Gentile, Ann; Kalbarczyk, Zbigniew T; Bauer, Gregory H; Enos, Jeremy James; Showerman, Michael; Kaplan, Larry; Bode, Brett; Greiner, Annette; Bonnie, Amanda; Mason, Mike; Iyer, Ravishankar K; Kramer, William T.

Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 797-800 8049019 (Proceedings - IEEE International Conference on Cluster Computing, ICCC; Vol. 2017-September).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jha, S, Brandt, J, Gentile, A, Kalbarczyk, ZT, Bauer, GH, Enos, JJ, Showerman, M, Kaplan, L, Bode, B, Greiner, A, Bonnie, A, Mason, M, Iyer, RK & Kramer, WT 2017, Holistic measurement-driven system assessment. in Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017., 8049019, Proceedings - IEEE International Conference on Cluster Computing, ICCC, vol. 2017-September, Institute of Electrical and Electronics Engineers Inc., pp. 797-800, 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017, Honolulu, United States, 9/5/17. https://doi.org/10.1109/CLUSTER.2017.124
Jha S, Brandt J, Gentile A, Kalbarczyk ZT, Bauer GH, Enos JJ et al. Holistic measurement-driven system assessment. In Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 797-800. 8049019. (Proceedings - IEEE International Conference on Cluster Computing, ICCC). https://doi.org/10.1109/CLUSTER.2017.124
Jha, Saurabh ; Brandt, Jim ; Gentile, Ann ; Kalbarczyk, Zbigniew T ; Bauer, Gregory H ; Enos, Jeremy James ; Showerman, Michael ; Kaplan, Larry ; Bode, Brett ; Greiner, Annette ; Bonnie, Amanda ; Mason, Mike ; Iyer, Ravishankar K ; Kramer, William T. / Holistic measurement-driven system assessment. Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 797-800 (Proceedings - IEEE International Conference on Cluster Computing, ICCC).
@inproceedings{0b55a75ff00d465485f04fd3db6400e6,
title = "Holistic measurement-driven system assessment",
abstract = "In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.",
author = "Saurabh Jha and Jim Brandt and Ann Gentile and Kalbarczyk, {Zbigniew T} and Bauer, {Gregory H} and Enos, {Jeremy James} and Michael Showerman and Larry Kaplan and Brett Bode and Annette Greiner and Amanda Bonnie and Mike Mason and Iyer, {Ravishankar K} and Kramer, {William T}",
year = "2017",
month = "9",
day = "22",
doi = "10.1109/CLUSTER.2017.124",
language = "English (US)",
series = "Proceedings - IEEE International Conference on Cluster Computing, ICCC",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "797--800",
booktitle = "Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017",
address = "United States",

}

TY - GEN

T1 - Holistic measurement-driven system assessment

AU - Jha, Saurabh

AU - Brandt, Jim

AU - Gentile, Ann

AU - Kalbarczyk, Zbigniew T

AU - Bauer, Gregory H

AU - Enos, Jeremy James

AU - Showerman, Michael

AU - Kaplan, Larry

AU - Bode, Brett

AU - Greiner, Annette

AU - Bonnie, Amanda

AU - Mason, Mike

AU - Iyer, Ravishankar K

AU - Kramer, William T

PY - 2017/9/22

Y1 - 2017/9/22

N2 - In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.

AB - In high-performance computing systems, application performance and throughput are dependent on a complex interplay of hardware and software subsystems and variable workloads with competing resource demands. Data-driven insights into the potentially widespread scope and propagationof impact of events, such as faults and contention for shared resources, can be used to drive more effective use of resources, for improved root cause diagnosis, and for predicting performance impacts. We present work developing integrated capabilities for holistic monitoring and analysis to understand and characterize propagation of performance-degrading events. These characterizations can be used to determine and invoke mitigating responses by system administrators, applications, and system software.

UR - http://www.scopus.com/inward/record.url?scp=85032632760&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032632760&partnerID=8YFLogxK

U2 - 10.1109/CLUSTER.2017.124

DO - 10.1109/CLUSTER.2017.124

M3 - Conference contribution

AN - SCOPUS:85032632760

T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC

SP - 797

EP - 800

BT - Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -