TY - GEN
T1 - Mapping mutable genres in structurally complex volumes
AU - Underwood, Ted
AU - Black, Michael L.
AU - Auvil, Loretta
AU - Capitanu, Boris
PY - 2013
Y1 - 2013
N2 - To mine large digital libraries in humanistically meaningful ways, we need to divide them by genre. This is a task that classification algorithms are well suited to assist, but they need adjustment to address the specific challenges of this domain. Digital libraries pose two problems of scale not usually found in the article datasets used to test these algorithms. 1) Because libraries span several centuries, the genres being identified may change gradually across the time axis. 2) Because volumes are much longer than articles, they tend to be internally heterogeneous, and the classification task also requires segmentation. We describe a multilayered solution that trains hidden Markov models to segment volumes, and uses ensembles of overlapping classifers to address historical change. We demonstrate this on a collection of 469,200 volumes drawn from HathiTrust Digital Library.
AB - To mine large digital libraries in humanistically meaningful ways, we need to divide them by genre. This is a task that classification algorithms are well suited to assist, but they need adjustment to address the specific challenges of this domain. Digital libraries pose two problems of scale not usually found in the article datasets used to test these algorithms. 1) Because libraries span several centuries, the genres being identified may change gradually across the time axis. 2) Because volumes are much longer than articles, they tend to be internally heterogeneous, and the classification task also requires segmentation. We describe a multilayered solution that trains hidden Markov models to segment volumes, and uses ensembles of overlapping classifers to address historical change. We demonstrate this on a collection of 469,200 volumes drawn from HathiTrust Digital Library.
UR - http://www.scopus.com/inward/record.url?scp=84893329276&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893329276&partnerID=8YFLogxK
U2 - 10.1109/BigData.2013.6691676
DO - 10.1109/BigData.2013.6691676
M3 - Conference contribution
AN - SCOPUS:84893329276
SN - 9781479912926
T3 - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
SP - 95
EP - 103
BT - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PB - IEEE Computer Society
T2 - 2013 IEEE International Conference on Big Data, Big Data 2013
Y2 - 6 October 2013 through 9 October 2013
ER -