Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data: A case study of topic modeling

David Bainbridge, David M. Nichols, Annika Hinze, J. Stephen Downie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We report on a case-study to independently reproduce the work given in a publicly available blog on how to develop a topic model sourced from a collection of texts, where both the data set and source code used are readily available. More specifically, we detail the steps necessary-and the challenges that had to be overcome-to replicate the work using the HathiTrust Research Center's virtual machine Data Capsule platform. From this we make recommendations for authors to follow, based on the lessons learned. We also show that the Data Capsule model can be put to work in a way that is of benefit to those interested in supporting computational reproducibility within their organizations.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019
EditorsMaria Bonn, Dan Wu, Stephen J. Downie, Alain Martaus
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages463-464
Number of pages2
ISBN (Electronic)9781728115474
DOIs
StatePublished - Jun 2019
Event19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019 - Urbana-Champaign, United States
Duration: Jun 2 2019Jun 6 2019

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2019-June
ISSN (Print)1552-5996

Conference

Conference19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019
CountryUnited States
CityUrbana-Champaign
Period6/2/196/6/19

Fingerprint

Digital libraries
Blogs
Virtual machine

Keywords

  • Digital-Libraries
  • Experimental-Reproducibility
  • Virtual-Machine

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Bainbridge, D., Nichols, D. M., Hinze, A., & Downie, J. S. (2019). Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data: A case study of topic modeling. In M. Bonn, D. Wu, S. J. Downie, & A. Martaus (Eds.), Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019 (pp. 463-464). [8791136] (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2019-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/JCDL.2019.00124

Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data : A case study of topic modeling. / Bainbridge, David; Nichols, David M.; Hinze, Annika; Downie, J. Stephen.

Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019. ed. / Maria Bonn; Dan Wu; Stephen J. Downie; Alain Martaus. Institute of Electrical and Electronics Engineers Inc., 2019. p. 463-464 8791136 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bainbridge, D, Nichols, DM, Hinze, A & Downie, JS 2019, Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data: A case study of topic modeling. in M Bonn, D Wu, SJ Downie & A Martaus (eds), Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019., 8791136, Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, vol. 2019-June, Institute of Electrical and Electronics Engineers Inc., pp. 463-464, 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019, Urbana-Champaign, United States, 6/2/19. https://doi.org/10.1109/JCDL.2019.00124
Bainbridge D, Nichols DM, Hinze A, Downie JS. Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data: A case study of topic modeling. In Bonn M, Wu D, Downie SJ, Martaus A, editors, Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 463-464. 8791136. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1109/JCDL.2019.00124
Bainbridge, David ; Nichols, David M. ; Hinze, Annika ; Downie, J. Stephen. / Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data : A case study of topic modeling. Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019. editor / Maria Bonn ; Dan Wu ; Stephen J. Downie ; Alain Martaus. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 463-464 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
@inproceedings{5b7656d3a23d4345a06d803bdb28994b,
title = "Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data: A case study of topic modeling",
abstract = "We report on a case-study to independently reproduce the work given in a publicly available blog on how to develop a topic model sourced from a collection of texts, where both the data set and source code used are readily available. More specifically, we detail the steps necessary-and the challenges that had to be overcome-to replicate the work using the HathiTrust Research Center's virtual machine Data Capsule platform. From this we make recommendations for authors to follow, based on the lessons learned. We also show that the Data Capsule model can be put to work in a way that is of benefit to those interested in supporting computational reproducibility within their organizations.",
keywords = "Digital-Libraries, Experimental-Reproducibility, Virtual-Machine",
author = "David Bainbridge and Nichols, {David M.} and Annika Hinze and Downie, {J. Stephen}",
year = "2019",
month = "6",
doi = "10.1109/JCDL.2019.00124",
language = "English (US)",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "463--464",
editor = "Maria Bonn and Dan Wu and Downie, {Stephen J.} and Alain Martaus",
booktitle = "Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019",
address = "United States",

}

TY - GEN

T1 - Using the HTRC data capsule model to promote reuse and evolution of experimental analysis of digital library data

T2 - A case study of topic modeling

AU - Bainbridge, David

AU - Nichols, David M.

AU - Hinze, Annika

AU - Downie, J. Stephen

PY - 2019/6

Y1 - 2019/6

N2 - We report on a case-study to independently reproduce the work given in a publicly available blog on how to develop a topic model sourced from a collection of texts, where both the data set and source code used are readily available. More specifically, we detail the steps necessary-and the challenges that had to be overcome-to replicate the work using the HathiTrust Research Center's virtual machine Data Capsule platform. From this we make recommendations for authors to follow, based on the lessons learned. We also show that the Data Capsule model can be put to work in a way that is of benefit to those interested in supporting computational reproducibility within their organizations.

AB - We report on a case-study to independently reproduce the work given in a publicly available blog on how to develop a topic model sourced from a collection of texts, where both the data set and source code used are readily available. More specifically, we detail the steps necessary-and the challenges that had to be overcome-to replicate the work using the HathiTrust Research Center's virtual machine Data Capsule platform. From this we make recommendations for authors to follow, based on the lessons learned. We also show that the Data Capsule model can be put to work in a way that is of benefit to those interested in supporting computational reproducibility within their organizations.

KW - Digital-Libraries

KW - Experimental-Reproducibility

KW - Virtual-Machine

UR - http://www.scopus.com/inward/record.url?scp=85071053334&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071053334&partnerID=8YFLogxK

U2 - 10.1109/JCDL.2019.00124

DO - 10.1109/JCDL.2019.00124

M3 - Conference contribution

AN - SCOPUS:85071053334

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 463

EP - 464

BT - Proceedings - 2019 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019

A2 - Bonn, Maria

A2 - Wu, Dan

A2 - Downie, Stephen J.

A2 - Martaus, Alain

PB - Institute of Electrical and Electronics Engineers Inc.

ER -