A preliminary evaluation of HathiTrust metadata

Assessing the sufficiency of legacy records

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Print-based libraries use metadata (specifically MARC catalog records) for both bibliographic control and to support discovery through online public access catalogs. Depending on its accuracy, completeness, and detail, metadata can afford an aerial view of a collection's topical strengths, scope of coverage, and item-to-item relationships, but the view offered is in part a function of metadata design. Most MARC records were created to support management of large print collections and optimized to meet the requirements of library online public access catalogs. How well do pre-existing MARC records serve the discovery needs of scholars using a large-scale digital library hosting collections of retrospectively digitized books and serials? This paper reports on an ongoing assessment of the utility of the MARC-based metadata underlying the HathiTrust Digital Library and explores the implications for advanced computational access to texts in the HathiTrust. We consider here the utility of metadata to scholars creating worksets for analysis, examining three user scenarios, which were gleaned from an ongoing user-requirements study done for the HathiTrust Research Center: (1) using metadata fields in combination for corpus characterization and discovery; (2) relying on metadata to identify resources of interest; and (3) using bibliographies of known items to seed research worksets. Our goal is to better understand the need for metadata remediation and augmentation and assess the scope of additional work required.

Original languageEnglish (US)
Title of host publication2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages317-320
Number of pages4
ISBN (Electronic)9781479955695
DOIs
StatePublished - Dec 1 2014
Event2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 - London, United Kingdom
Duration: Sep 8 2014Sep 12 2014

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
CountryUnited Kingdom
CityLondon
Period9/8/149/12/14

Fingerprint

Metadata
Digital libraries
Bibliographies
Remediation
Seed
Antennas

Keywords

  • Digital Library metadata requirements
  • HathiTrust Research Center
  • MARC-based metadata
  • metadata evaluation
  • metadata reliability
  • workset creation for scholarly analysis

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Fenlon, K., Fallaw, C., Cole, T., & Han, M. J. (2014). A preliminary evaluation of HathiTrust metadata: Assessing the sufficiency of legacy records. In 2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 (pp. 317-320). [6970186] (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/JCDL.2014.6970186

A preliminary evaluation of HathiTrust metadata : Assessing the sufficiency of legacy records. / Fenlon, Katrina; Fallaw, Colleen; Cole, Timothy; Han, Myung Ja.

2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 317-320 6970186 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fenlon, K, Fallaw, C, Cole, T & Han, MJ 2014, A preliminary evaluation of HathiTrust metadata: Assessing the sufficiency of legacy records. in 2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014., 6970186, Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Institute of Electrical and Electronics Engineers Inc., pp. 317-320, 2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014, London, United Kingdom, 9/8/14. https://doi.org/10.1109/JCDL.2014.6970186
Fenlon K, Fallaw C, Cole T, Han MJ. A preliminary evaluation of HathiTrust metadata: Assessing the sufficiency of legacy records. In 2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 317-320. 6970186. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1109/JCDL.2014.6970186
Fenlon, Katrina ; Fallaw, Colleen ; Cole, Timothy ; Han, Myung Ja. / A preliminary evaluation of HathiTrust metadata : Assessing the sufficiency of legacy records. 2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 317-320 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
@inproceedings{f395f446dac941f79de0db20cbd3110d,
title = "A preliminary evaluation of HathiTrust metadata: Assessing the sufficiency of legacy records",
abstract = "Print-based libraries use metadata (specifically MARC catalog records) for both bibliographic control and to support discovery through online public access catalogs. Depending on its accuracy, completeness, and detail, metadata can afford an aerial view of a collection's topical strengths, scope of coverage, and item-to-item relationships, but the view offered is in part a function of metadata design. Most MARC records were created to support management of large print collections and optimized to meet the requirements of library online public access catalogs. How well do pre-existing MARC records serve the discovery needs of scholars using a large-scale digital library hosting collections of retrospectively digitized books and serials? This paper reports on an ongoing assessment of the utility of the MARC-based metadata underlying the HathiTrust Digital Library and explores the implications for advanced computational access to texts in the HathiTrust. We consider here the utility of metadata to scholars creating worksets for analysis, examining three user scenarios, which were gleaned from an ongoing user-requirements study done for the HathiTrust Research Center: (1) using metadata fields in combination for corpus characterization and discovery; (2) relying on metadata to identify resources of interest; and (3) using bibliographies of known items to seed research worksets. Our goal is to better understand the need for metadata remediation and augmentation and assess the scope of additional work required.",
keywords = "Digital Library metadata requirements, HathiTrust Research Center, MARC-based metadata, metadata evaluation, metadata reliability, workset creation for scholarly analysis",
author = "Katrina Fenlon and Colleen Fallaw and Timothy Cole and Han, {Myung Ja}",
year = "2014",
month = "12",
day = "1",
doi = "10.1109/JCDL.2014.6970186",
language = "English (US)",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "317--320",
booktitle = "2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014",
address = "United States",

}

TY - GEN

T1 - A preliminary evaluation of HathiTrust metadata

T2 - Assessing the sufficiency of legacy records

AU - Fenlon, Katrina

AU - Fallaw, Colleen

AU - Cole, Timothy

AU - Han, Myung Ja

PY - 2014/12/1

Y1 - 2014/12/1

N2 - Print-based libraries use metadata (specifically MARC catalog records) for both bibliographic control and to support discovery through online public access catalogs. Depending on its accuracy, completeness, and detail, metadata can afford an aerial view of a collection's topical strengths, scope of coverage, and item-to-item relationships, but the view offered is in part a function of metadata design. Most MARC records were created to support management of large print collections and optimized to meet the requirements of library online public access catalogs. How well do pre-existing MARC records serve the discovery needs of scholars using a large-scale digital library hosting collections of retrospectively digitized books and serials? This paper reports on an ongoing assessment of the utility of the MARC-based metadata underlying the HathiTrust Digital Library and explores the implications for advanced computational access to texts in the HathiTrust. We consider here the utility of metadata to scholars creating worksets for analysis, examining three user scenarios, which were gleaned from an ongoing user-requirements study done for the HathiTrust Research Center: (1) using metadata fields in combination for corpus characterization and discovery; (2) relying on metadata to identify resources of interest; and (3) using bibliographies of known items to seed research worksets. Our goal is to better understand the need for metadata remediation and augmentation and assess the scope of additional work required.

AB - Print-based libraries use metadata (specifically MARC catalog records) for both bibliographic control and to support discovery through online public access catalogs. Depending on its accuracy, completeness, and detail, metadata can afford an aerial view of a collection's topical strengths, scope of coverage, and item-to-item relationships, but the view offered is in part a function of metadata design. Most MARC records were created to support management of large print collections and optimized to meet the requirements of library online public access catalogs. How well do pre-existing MARC records serve the discovery needs of scholars using a large-scale digital library hosting collections of retrospectively digitized books and serials? This paper reports on an ongoing assessment of the utility of the MARC-based metadata underlying the HathiTrust Digital Library and explores the implications for advanced computational access to texts in the HathiTrust. We consider here the utility of metadata to scholars creating worksets for analysis, examining three user scenarios, which were gleaned from an ongoing user-requirements study done for the HathiTrust Research Center: (1) using metadata fields in combination for corpus characterization and discovery; (2) relying on metadata to identify resources of interest; and (3) using bibliographies of known items to seed research worksets. Our goal is to better understand the need for metadata remediation and augmentation and assess the scope of additional work required.

KW - Digital Library metadata requirements

KW - HathiTrust Research Center

KW - MARC-based metadata

KW - metadata evaluation

KW - metadata reliability

KW - workset creation for scholarly analysis

UR - http://www.scopus.com/inward/record.url?scp=84919371838&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919371838&partnerID=8YFLogxK

U2 - 10.1109/JCDL.2014.6970186

DO - 10.1109/JCDL.2014.6970186

M3 - Conference contribution

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 317

EP - 320

BT - 2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -