Mapping mutable genres in structurally complex volumes

Ted Underwood, Michael L. Black, Loretta Auvil, Boris Capitanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To mine large digital libraries in humanistically meaningful ways, we need to divide them by genre. This is a task that classification algorithms are well suited to assist, but they need adjustment to address the specific challenges of this domain. Digital libraries pose two problems of scale not usually found in the article datasets used to test these algorithms. 1) Because libraries span several centuries, the genres being identified may change gradually across the time axis. 2) Because volumes are much longer than articles, they tend to be internally heterogeneous, and the classification task also requires segmentation. We describe a multilayered solution that trains hidden Markov models to segment volumes, and uses ensembles of overlapping classifers to address historical change. We demonstrate this on a collection of 469,200 volumes drawn from HathiTrust Digital Library.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PublisherIEEE Computer Society
Pages95-103
Number of pages9
ISBN (Print)9781479912926
DOIs
StatePublished - 2013
Event2013 IEEE International Conference on Big Data, Big Data 2013 - Santa Clara, CA, United States
Duration: Oct 6 2013Oct 9 2013

Publication series

NameProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

Other

Other2013 IEEE International Conference on Big Data, Big Data 2013
Country/TerritoryUnited States
CitySanta Clara, CA
Period10/6/1310/9/13

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Mapping mutable genres in structurally complex volumes'. Together they form a unique fingerprint.

Cite this