Query representation for cross-temporal information retrieval

Miles James Efron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper addresses the problem of long-term language change in information retrieval (IR) systems. IR research has often ignored lexical drift. But in the emerging domain of massive digitized book collections, the risk of vocabulary mismatch due to language change is high. Collections such as Google Books and the Hathi Trust contain text written in the vernaculars of many centuries. With respect to IR, changes in vocabulary and orthography make 14th-Century English qualitatively different from 21st-Century English. This challenges retrieval models that rely on keyword matching. With this challenge in mind, we ask: given a query written in contemporary English, how can we retrieve relevant documents that were written in early English? We argue that search in historically diverse corpora is similar to cross-language retrieval (CLIR). By considering "modern" English and "archaic" English as distinct languages, CLIR techniques can improve what we call cross-temporal IR (CTIR). We focus on ways to combine evidence to improve CTIR effectiveness, proposing and testing several ways to handle language change during book search. We find that a principled combination of three sources of evidence during relevance feedback yields strong CTIR performance.

Original languageEnglish (US)
Title of host publicationSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages383-392
Number of pages10
DOIs
StatePublished - Sep 2 2013
Event36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland
Duration: Jul 28 2013Aug 1 2013

Publication series

NameSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
CountryIreland
CityDublin
Period7/28/138/1/13

    Fingerprint

Keywords

  • Book search
  • Digital libraries
  • Information retrieval
  • Temporality

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Efron, M. J. (2013). Query representation for cross-temporal information retrieval. In SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 383-392). (SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval). https://doi.org/10.1145/2484028.2484054