Positional language models for information retrieval

Yuanhua Lv, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining "external" to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a "black box" to implement these heuristics, making it hard to optimize the combination parameters. In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model.

Original languageEnglish (US)
Title of host publicationProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Pages299-306
Number of pages8
DOIs
StatePublished - 2009
Event32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 - Boston, MA, United States
Duration: Jul 19 2009Jul 23 2009

Publication series

NameProceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009

Other

Other32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
Country/TerritoryUnited States
CityBoston, MA
Period7/19/097/23/09

Keywords

  • Passage retrieval
  • Positional language models
  • Proximity

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Positional language models for information retrieval'. Together they form a unique fingerprint.

Cite this