Improving digital libraries' provision of digital humanities datasets: A case study of htrc literature dataset

Yuerong Hu, Ming Jiang, Ted Underwood, J. Stephen Downie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper investigates the limitations and challenges of the curated datasets provided by digital libraries in support of digital humanities research. This presented work provides a use case utilizing an English literature dataset of 178,381 volumes curated by the HathiTrust Research Center (HTRC) for measuring the change of three literature genres. These volumes were selected from over 17 million digitized items in the HathiTrust Digital Library. We demonstrate our methods and workflow for improving the representativeness and scholarly usability of the existing datasets. We analyzed and effectively overcame three common limitations: duplicate volumes, uneven distribution of data and OCR errors. We suggest that stakeholders of digital libraries should flag and address these limitations to improve their provisions' usability in the context of digital humanities research.

Original languageEnglish (US)
Title of host publicationJCDL 2020 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages405-408
Number of pages4
ISBN (Electronic)9781450375856
DOIs
StatePublished - Aug 1 2020
Event2020 ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2020 - Virtual, Online, China
Duration: Aug 1 2020Aug 5 2020

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Conference

Conference2020 ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2020
Country/TerritoryChina
CityVirtual, Online
Period8/1/208/5/20

Keywords

  • Cultural analytics
  • Datasets
  • Digital humanities
  • Digital libraries

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Improving digital libraries' provision of digital humanities datasets: A case study of htrc literature dataset'. Together they form a unique fingerprint.

Cite this