Best practices and lessons from deploying and operating a sustained-petascale system: The blue waters experience

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Building and operating versatile extreme-scale computing systems that work productively for a range of frontier research domains present many challenges and opportunities. Solutions created, experiences acquired, and lessons learned, while rarely published, could drive the development of new methods and practices and raise the bar for all organizations supporting research, scholarship, and education. This paper describes the methods and procedures developed for deploying, supporting, and continuously improving the Blue Waters system and its services during the last five years. Being the first US sustained-petascale computing platform available to the open-science community, the Blue Waters project pioneered various unique practices that we are sharing to be adopted and further improved by the community. We present our support and service methodologies, and the leadership practices employed for ensuring that the system stays highly efficient and productive. We also provide the return on investment summaries related to deploying and operating the system.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages673-684
Number of pages12
ISBN (Electronic)9781538683842
DOIs
StatePublished - Mar 11 2019
Event2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States
Duration: Nov 11 2018Nov 16 2018

Publication series

NameProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
CountryUnited States
CityDallas
Period11/11/1811/16/18

Fingerprint

Best Practice
Water
Education
Leadership
Computing
Sharing
Extremes
Experience
Methodology
Range of data
Community

Keywords

  • Best practices
  • HPC center
  • System management

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Theoretical Computer Science

Cite this

Bauer, G. H., Bode, B., Enos, J. J., Kramer, W. T., Lathrop, S., Mendes, C. L., & Sisneros, R. R. (2019). Best practices and lessons from deploying and operating a sustained-petascale system: The blue waters experience. In Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 (pp. 673-684). [8665815] (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SC.2018.00056

Best practices and lessons from deploying and operating a sustained-petascale system : The blue waters experience. / Bauer, Gregory H; Bode, Brett; Enos, Jeremy James; Kramer, William T; Lathrop, Scott; Mendes, Celso Luiz; Sisneros, Roberto Reynel.

Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. Institute of Electrical and Electronics Engineers Inc., 2019. p. 673-684 8665815 (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bauer, GH, Bode, B, Enos, JJ, Kramer, WT, Lathrop, S, Mendes, CL & Sisneros, RR 2019, Best practices and lessons from deploying and operating a sustained-petascale system: The blue waters experience. in Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018., 8665815, Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Institute of Electrical and Electronics Engineers Inc., pp. 673-684, 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, United States, 11/11/18. https://doi.org/10.1109/SC.2018.00056
Bauer GH, Bode B, Enos JJ, Kramer WT, Lathrop S, Mendes CL et al. Best practices and lessons from deploying and operating a sustained-petascale system: The blue waters experience. In Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 673-684. 8665815. (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018). https://doi.org/10.1109/SC.2018.00056
Bauer, Gregory H ; Bode, Brett ; Enos, Jeremy James ; Kramer, William T ; Lathrop, Scott ; Mendes, Celso Luiz ; Sisneros, Roberto Reynel. / Best practices and lessons from deploying and operating a sustained-petascale system : The blue waters experience. Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 673-684 (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018).
@inproceedings{e70523aefe464472b3825e89d490775b,
title = "Best practices and lessons from deploying and operating a sustained-petascale system: The blue waters experience",
abstract = "Building and operating versatile extreme-scale computing systems that work productively for a range of frontier research domains present many challenges and opportunities. Solutions created, experiences acquired, and lessons learned, while rarely published, could drive the development of new methods and practices and raise the bar for all organizations supporting research, scholarship, and education. This paper describes the methods and procedures developed for deploying, supporting, and continuously improving the Blue Waters system and its services during the last five years. Being the first US sustained-petascale computing platform available to the open-science community, the Blue Waters project pioneered various unique practices that we are sharing to be adopted and further improved by the community. We present our support and service methodologies, and the leadership practices employed for ensuring that the system stays highly efficient and productive. We also provide the return on investment summaries related to deploying and operating the system.",
keywords = "Best practices, HPC center, System management",
author = "Bauer, {Gregory H} and Brett Bode and Enos, {Jeremy James} and Kramer, {William T} and Scott Lathrop and Mendes, {Celso Luiz} and Sisneros, {Roberto Reynel}",
year = "2019",
month = "3",
day = "11",
doi = "10.1109/SC.2018.00056",
language = "English (US)",
series = "Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "673--684",
booktitle = "Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018",
address = "United States",

}

TY - GEN

T1 - Best practices and lessons from deploying and operating a sustained-petascale system

T2 - The blue waters experience

AU - Bauer, Gregory H

AU - Bode, Brett

AU - Enos, Jeremy James

AU - Kramer, William T

AU - Lathrop, Scott

AU - Mendes, Celso Luiz

AU - Sisneros, Roberto Reynel

PY - 2019/3/11

Y1 - 2019/3/11

N2 - Building and operating versatile extreme-scale computing systems that work productively for a range of frontier research domains present many challenges and opportunities. Solutions created, experiences acquired, and lessons learned, while rarely published, could drive the development of new methods and practices and raise the bar for all organizations supporting research, scholarship, and education. This paper describes the methods and procedures developed for deploying, supporting, and continuously improving the Blue Waters system and its services during the last five years. Being the first US sustained-petascale computing platform available to the open-science community, the Blue Waters project pioneered various unique practices that we are sharing to be adopted and further improved by the community. We present our support and service methodologies, and the leadership practices employed for ensuring that the system stays highly efficient and productive. We also provide the return on investment summaries related to deploying and operating the system.

AB - Building and operating versatile extreme-scale computing systems that work productively for a range of frontier research domains present many challenges and opportunities. Solutions created, experiences acquired, and lessons learned, while rarely published, could drive the development of new methods and practices and raise the bar for all organizations supporting research, scholarship, and education. This paper describes the methods and procedures developed for deploying, supporting, and continuously improving the Blue Waters system and its services during the last five years. Being the first US sustained-petascale computing platform available to the open-science community, the Blue Waters project pioneered various unique practices that we are sharing to be adopted and further improved by the community. We present our support and service methodologies, and the leadership practices employed for ensuring that the system stays highly efficient and productive. We also provide the return on investment summaries related to deploying and operating the system.

KW - Best practices

KW - HPC center

KW - System management

UR - http://www.scopus.com/inward/record.url?scp=85064112418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064112418&partnerID=8YFLogxK

U2 - 10.1109/SC.2018.00056

DO - 10.1109/SC.2018.00056

M3 - Conference contribution

AN - SCOPUS:85064112418

T3 - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

SP - 673

EP - 684

BT - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -