TY - GEN
T1 - Automatic Reproduction of Workflows in the Snakemake Workflow Catalog and nf-core Registries
AU - Grayson, Samuel
AU - Marinov, Darko
AU - Katz, Daniel S.
AU - Milewicz, Reed
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/27
Y1 - 2023/6/27
N2 - Workflows make it easier for scientists to assemble computational experiments consisting of many disparate components. However, those disparate components also increase the probability that the computational experiment fails to be reproducible. Even if software is reproducible today, it may become irreproducible tomorrow without the software itself changing at all, because of the constantly changing software environment in which the software is run. To alleviate irreproducibility, workflow engines integrate with container engines. Additionally, communities that sprung up around workflow engines started to host registries for workflows that follow standards. These standards reduce the effort needed to make workflows automatically reproducible. In this paper, we study automatic reproduction of workflows from two registries, focusing on non-crashing executions. The experimental data lets us analyze the upper bound to which workflow engines could achieve reproducibility. We identify lessons learned in achieving reproducibility in practice.
AB - Workflows make it easier for scientists to assemble computational experiments consisting of many disparate components. However, those disparate components also increase the probability that the computational experiment fails to be reproducible. Even if software is reproducible today, it may become irreproducible tomorrow without the software itself changing at all, because of the constantly changing software environment in which the software is run. To alleviate irreproducibility, workflow engines integrate with container engines. Additionally, communities that sprung up around workflow engines started to host registries for workflows that follow standards. These standards reduce the effort needed to make workflows automatically reproducible. In this paper, we study automatic reproduction of workflows from two registries, focusing on non-crashing executions. The experimental data lets us analyze the upper bound to which workflow engines could achieve reproducibility. We identify lessons learned in achieving reproducibility in practice.
KW - reproducibility
KW - research software engineering
KW - workflow engines
UR - http://www.scopus.com/inward/record.url?scp=85166009941&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166009941&partnerID=8YFLogxK
U2 - 10.1145/3589806.3600037
DO - 10.1145/3589806.3600037
M3 - Conference contribution
AN - SCOPUS:85166009941
T3 - Proceedings of the 1st ACM Conference on Reproducibility and Replicability, REP 2023
SP - 74
EP - 84
BT - Proceedings of the 1st ACM Conference on Reproducibility and Replicability, REP 2023
PB - Association for Computing Machinery
T2 - 1st ACM Conference on Reproducibility and Replicability, REP 2023
Y2 - 27 June 2023 through 29 June 2023
ER -