Automatic Reproduction of Workflows in the Snakemake Workflow Catalog and nf-core Registries

Samuel Grayson, Darko Marinov, Daniel S. Katz, Reed Milewicz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Workflows make it easier for scientists to assemble computational experiments consisting of many disparate components. However, those disparate components also increase the probability that the computational experiment fails to be reproducible. Even if software is reproducible today, it may become irreproducible tomorrow without the software itself changing at all, because of the constantly changing software environment in which the software is run. To alleviate irreproducibility, workflow engines integrate with container engines. Additionally, communities that sprung up around workflow engines started to host registries for workflows that follow standards. These standards reduce the effort needed to make workflows automatically reproducible. In this paper, we study automatic reproduction of workflows from two registries, focusing on non-crashing executions. The experimental data lets us analyze the upper bound to which workflow engines could achieve reproducibility. We identify lessons learned in achieving reproducibility in practice.

Original languageEnglish (US)
Title of host publicationProceedings of the 1st ACM Conference on Reproducibility and Replicability, REP 2023
PublisherAssociation for Computing Machinery
Pages74-84
Number of pages11
ISBN (Electronic)9798400701764
DOIs
StatePublished - Jun 27 2023
Event1st ACM Conference on Reproducibility and Replicability, REP 2023 - Santa Cruz, United States
Duration: Jun 27 2023Jun 29 2023

Publication series

NameProceedings of the 1st ACM Conference on Reproducibility and Replicability, REP 2023

Conference

Conference1st ACM Conference on Reproducibility and Replicability, REP 2023
Country/TerritoryUnited States
CitySanta Cruz
Period6/27/236/29/23

Keywords

  • reproducibility
  • research software engineering
  • workflow engines

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Automatic Reproduction of Workflows in the Snakemake Workflow Catalog and nf-core Registries'. Together they form a unique fingerprint.

Cite this