Detailed Implementation of a Reproducible Machine Learning-Enabled Workflow

Kenneth E. Schackart, Heidi J. Imker, Charles E. Cook

Research output: Contribution to journalArticlepeer-review


Machine learning (ML) and advanced computational methods are powerful tools for processing and deriving value from large data volumes. These methods are being developed and deployed rapidly, but best practices are still evolving regarding code and data standards, leading to irreproducibility of ML-enabled research. In this Practice Paper, we describe our efforts to make a ML-enabled research project to create a global inventory of biodata resources open and reproducible. To contribute to community conversations on evolving norms and expectations, we present our experiences as a practical, real-world case study that includes the implementation details as well as our overall approach and subsequent decisions. Our goal in openly sharing this experience is to provide a concrete example that others may consider as they look to vet, adapt, and adopt similar strategies to make their own work open and reproducible.

Original languageEnglish (US)
Article number23
JournalData Science Journal
Issue number1
StatePublished - 2024


  • biodata resource inventory
  • computational reproducibility
  • FAIR data
  • machine learning workflow
  • open science
  • research software

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science Applications


Dive into the research topics of 'Detailed Implementation of a Reproducible Machine Learning-Enabled Workflow'. Together they form a unique fingerprint.

Cite this