Deploying a large petascale system: The Blue Waters experience

Research output: Contribution to journalConference article

Abstract

Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.

Original languageEnglish (US)
Pages (from-to)198-209
Number of pages12
JournalProcedia Computer Science
Volume29
DOIs
StatePublished - Jan 1 2014
Event14th Annual International Conference on Computational Science, ICCS 2014 - Cairns, QLD, Australia
Duration: Jun 10 2014Jun 12 2014

    Fingerprint

Keywords

  • Acceptance testing
  • Large system deployment
  • Petascale performance

ASJC Scopus subject areas

  • Computer Science(all)

Cite this