TY - GEN
T1 - Best practices and lessons from deploying and operating a sustained-petascale system
T2 - 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
AU - Bauer, Gregory H.
AU - Bode, Brett
AU - Enos, Jeremy
AU - Kramer, William T.
AU - Lathrop, Scott
AU - Mendes, Celso L.
AU - Sisneros, Roberto R.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Building and operating versatile extreme-scale computing systems that work productively for a range of frontier research domains present many challenges and opportunities. Solutions created, experiences acquired, and lessons learned, while rarely published, could drive the development of new methods and practices and raise the bar for all organizations supporting research, scholarship, and education. This paper describes the methods and procedures developed for deploying, supporting, and continuously improving the Blue Waters system and its services during the last five years. Being the first US sustained-petascale computing platform available to the open-science community, the Blue Waters project pioneered various unique practices that we are sharing to be adopted and further improved by the community. We present our support and service methodologies, and the leadership practices employed for ensuring that the system stays highly efficient and productive. We also provide the return on investment summaries related to deploying and operating the system.
AB - Building and operating versatile extreme-scale computing systems that work productively for a range of frontier research domains present many challenges and opportunities. Solutions created, experiences acquired, and lessons learned, while rarely published, could drive the development of new methods and practices and raise the bar for all organizations supporting research, scholarship, and education. This paper describes the methods and procedures developed for deploying, supporting, and continuously improving the Blue Waters system and its services during the last five years. Being the first US sustained-petascale computing platform available to the open-science community, the Blue Waters project pioneered various unique practices that we are sharing to be adopted and further improved by the community. We present our support and service methodologies, and the leadership practices employed for ensuring that the system stays highly efficient and productive. We also provide the return on investment summaries related to deploying and operating the system.
KW - Best practices
KW - HPC center
KW - System management
UR - http://www.scopus.com/inward/record.url?scp=85064112418&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064112418&partnerID=8YFLogxK
U2 - 10.1109/SC.2018.00056
DO - 10.1109/SC.2018.00056
M3 - Conference contribution
AN - SCOPUS:85064112418
T3 - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
SP - 673
EP - 684
BT - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 November 2018 through 16 November 2018
ER -