Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation

David Bainbridge, J. Stephen Downie, Boris Capitanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text - down to the page-level - of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis. We present an extended example of the web environment in use, along with details about its implementation.

Original languageEnglish (US)
Title of host publicationJCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages407-408
Number of pages2
ISBN (Electronic)9781450351782
DOIs
StatePublished - May 23 2018
Event18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018 - Fort Worth, United States
Duration: Jun 3 2018Jun 7 2018

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other18th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2018
CountryUnited States
CityFort Worth
Period6/3/186/7/18

Keywords

  • extract feature text analysis
  • very large digital libraries
  • workset creation

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Bainbridge, D., Stephen Downie, J., & Capitanu, B. (2018). Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation. In JCDL 2018 - Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries (pp. 407-408). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3197026.3203875