Transferring a petabyte in a day

Rajkumar Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello

Research output: Contribution to journalArticlepeer-review

Abstract

Extreme-scale simulations and experiments can generate large amounts of data, whose volume can exceed the compute and/or storage capacity at the simulation or experimental facility. With the emergence of ultra-high-speed networks, researchers are considering pipelined approaches in which data are passed to a remote facility for analysis. Here we examine an extreme-scale cosmology simulation that, when run on a large fraction of a leadership computer, generates data at a rate of one petabyte per elapsed day. Writing those data to disk is inefficient and impractical, and in situ analysis poses its own difficulties. Thus we implement a pipeline in which data are generated on one supercomputer and then transferred, as they are generated, to a remote supercomputer for analysis. We use the Swift scripting language to instantiate this pipeline across Argonne National Laboratory and the National Center for Supercomputing Applications, which are connected by a 100 Gb/s network; and we demonstrate that by using the Globus transfer service we can achieve a sustained rate of 93 Gb/s over a 24-hour period, thus attaining our performance goal of one petabyte moved in 24 h. This paper describes the methods used and summarizes the lessons learned in this demonstration.

Original languageEnglish (US)
Pages (from-to)191-198
Number of pages8
JournalFuture Generation Computer Systems
Volume88
DOIs
StatePublished - Nov 2018

Keywords

  • Cosmology workflow
  • GridFTP
  • Large data transfer
  • Pipeline
  • Wide area data transfer

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Transferring a petabyte in a day'. Together they form a unique fingerprint.

Cite this