An exploration of proximity measures in information retrieval

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In most existing retrieval models, documents are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document frequencies, and document lengths. Intuitively, the proximity of matched query terms in a document can also be exploited to promote scores of documents in which the matched query terms are close to each other. Such a proximity heuristic, however, has been largely under-explored in the literature; it is unclear how we can model proximity and incorporate a proximity measure into an existing retrieval model. In this paper,we systematically explore the query term proximity heuristic. Specifically, we propose and study the effectiveness of five different proximity measures, each modeling proximity from a different perspective. We then design two heuristic constraints and use them to guide us in incorporating the proposed proximity measures into an existing retrieval model. Experiments on five standard TREC test collections show that one of the proposed proximity measures is indeed highly correlated with document relevance, and by incorporating it into the KL-divergence language model and the Okapi BM25 model, we can significantly improve retrieval performance.

Original languageEnglish (US)
Title of host publicationProceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
Pages295-302
Number of pages8
DOIs
StatePublished - 2007
Event30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 - Amsterdam, Netherlands
Duration: Jul 23 2007Jul 27 2007

Publication series

NameProceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07

Other

Other30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
CountryNetherlands
CityAmsterdam
Period7/23/077/27/07

Keywords

  • Distance measures
  • Proximity
  • Retrieval heuristics

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Applied Mathematics

Fingerprint Dive into the research topics of 'An exploration of proximity measures in information retrieval'. Together they form a unique fingerprint.

  • Cite this

    Tao, T., & Zhai, C. (2007). An exploration of proximity measures in information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 (pp. 295-302). (Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07). https://doi.org/10.1145/1277741.1277794