Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context

Matthew Krafczyk, August Shi, Adhithya Bhaskar, Darko Marinov, Victoria Stodden

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.

Original languageEnglish (US)
Title of host publicationP-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019
PublisherAssociation for Computing Machinery, Inc
Pages23-28
Number of pages6
ISBN (Electronic)9781450367561
DOIs
StatePublished - Jun 17 2019
Event2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS 2019, co-located with HPDC 2019 - Phoenix, United States
Duration: Jun 24 2019 → …

Publication series

NameP-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019

Conference

Conference2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS 2019, co-located with HPDC 2019
CountryUnited States
CityPhoenix
Period6/24/19 → …

Fingerprint

Software testing
Physics
Testing
Open source software

Keywords

  • Continuous integration
  • Reproducibility
  • Scientific software
  • Software reliability
  • Software testing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Cite this

Krafczyk, M., Shi, A., Bhaskar, A., Marinov, D., & Stodden, V. (2019). Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context. In P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019 (pp. 23-28). (P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019). Association for Computing Machinery, Inc. https://doi.org/10.1145/3322790.3330595

Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context. / Krafczyk, Matthew; Shi, August; Bhaskar, Adhithya; Marinov, Darko; Stodden, Victoria.

P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019. Association for Computing Machinery, Inc, 2019. p. 23-28 (P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Krafczyk, M, Shi, A, Bhaskar, A, Marinov, D & Stodden, V 2019, Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context. in P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019. P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019, Association for Computing Machinery, Inc, pp. 23-28, 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS 2019, co-located with HPDC 2019, Phoenix, United States, 6/24/19. https://doi.org/10.1145/3322790.3330595
Krafczyk M, Shi A, Bhaskar A, Marinov D, Stodden V. Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context. In P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019. Association for Computing Machinery, Inc. 2019. p. 23-28. (P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019). https://doi.org/10.1145/3322790.3330595
Krafczyk, Matthew ; Shi, August ; Bhaskar, Adhithya ; Marinov, Darko ; Stodden, Victoria. / Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context. P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019. Association for Computing Machinery, Inc, 2019. pp. 23-28 (P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019).
@inproceedings{18a14f96902a4a2e9636a380269db15c,
title = "Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context",
abstract = "Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.",
keywords = "Continuous integration, Reproducibility, Scientific software, Software reliability, Software testing",
author = "Matthew Krafczyk and August Shi and Adhithya Bhaskar and Darko Marinov and Victoria Stodden",
year = "2019",
month = "6",
day = "17",
doi = "10.1145/3322790.3330595",
language = "English (US)",
series = "P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019",
publisher = "Association for Computing Machinery, Inc",
pages = "23--28",
booktitle = "P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019",

}

TY - GEN

T1 - Scientific tests and continuous integration strategies to enhance reproducibility in the scientific software context

AU - Krafczyk, Matthew

AU - Shi, August

AU - Bhaskar, Adhithya

AU - Marinov, Darko

AU - Stodden, Victoria

PY - 2019/6/17

Y1 - 2019/6/17

N2 - Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.

AB - Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.

KW - Continuous integration

KW - Reproducibility

KW - Scientific software

KW - Software reliability

KW - Software testing

UR - http://www.scopus.com/inward/record.url?scp=85069145767&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069145767&partnerID=8YFLogxK

U2 - 10.1145/3322790.3330595

DO - 10.1145/3322790.3330595

M3 - Conference contribution

AN - SCOPUS:85069145767

T3 - P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019

SP - 23

EP - 28

BT - P-RECS 2019 - Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, co-located with HPDC 2019

PB - Association for Computing Machinery, Inc

ER -