Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation

Annika Hinze, Craig Taube-Schock, David Bainbridge, Rangi Matamua, J. Stephen Downie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.

Original languageEnglish (US)
Title of host publicationJCDL 2015 - Proceedings of the 15th ACM/IEEE-CE Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages147-156
Number of pages10
ISBN (Electronic)9781450335942
DOIs
StatePublished - Jun 21 2015
Event15th ACM/IEEE-CE Joint Conference on Digital Libraries, JCDL 2015 - Knoxville, United States
Duration: Jun 21 2015Jun 25 2015

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2015-June
ISSN (Print)1552-5996

Other

Other15th ACM/IEEE-CE Joint Conference on Digital Libraries, JCDL 2015
Country/TerritoryUnited States
CityKnoxville
Period6/21/156/25/15

Keywords

  • disambiguation
  • semantic classification
  • semantic search

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation'. Together they form a unique fingerprint.

Cite this