Toward enabling reproducibility for data-intensive research sing the whole tale platform

Kyle Chard, Niall Gaffney, Mihael Hategan, Kacper Kowalik, Bertram Ludäscher, Timothy McPhillips, Jarek Nabrzyski, Victoria Stodden, Ian Taylor, Thomas Thelen, Matthew J. Turk, Craig Willis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Whole Tale http://wholetale.org is a web-based, open-source platform for reproducible research supporting the creation, sharing, execution, and verification of 'Tales' for the scientific research community. Tales are executable research objects that capture the code, data, and environment along with narrative and workflow information needed to re-create computational results from scientific studies. Creating reproducible research objects that enable reproducibility, transparency, and re-execution for computational experiments requiring significant compute resources or utilizing massive data is an especially challenging open problem. We describe opportunities, challenges, and solutions to facilitating reproducibility for data-and compute-intensive research, that we call 'Tales at Scale,' using the Whole Tale computing platform.We highlight challenges and solutions in frontend responsiveness needs, gaps in current middleware design and implementation, network restrictions, containerization, and data access. Finally, we discuss challenges in packaging computational experiment implementations for portable data-intensive Tales and outline future work.

Original languageEnglish (US)
Title of host publicationParallel Computing
Subtitle of host publicationTechnology Trends
EditorsIan Foster, Gerhard R. Joubert, Ludek Kucera, Wolfgang E. Nagel, Frans Peters
PublisherIOS Press BV
Pages766-778
Number of pages13
ISBN (Electronic)9781643680705
DOIs
StatePublished - 2020

Publication series

NameAdvances in Parallel Computing
Volume36
ISSN (Print)0927-5452
ISSN (Electronic)1879-808X

Keywords

  • Big data
  • Computational science
  • Cyberinfrastructure
  • Data provenance
  • Platform as a service
  • Replicability
  • Reproducibility
  • Reproducible research
  • Scalability
  • Science as a service
  • Scientific computing
  • Scientific workflows
  • Transparency

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Toward enabling reproducibility for data-intensive research sing the whole tale platform'. Together they form a unique fingerprint.

Cite this