Negative query generation: bridging the gap between query likelihood retrieval models and relevance

Yuanhua Lv, Cheng Xiang Zhai

Research output: Contribution to journalArticlepeer-review

Abstract

The language modeling approach to information retrieval has recently attracted much attention. In the language modeling retrieval models, we can score and rank documents based on the query likelihood method. From the theoretical perspective, however, the justification of the existing (standard) query likelihood method based on the probability ranking principle requires an unrealistic assumption about the generation of a “negative query” from a document, which states that the probability that a user who dislikes a document would use a query does not depend on the particular document. This assumption enables ignoring the negative query generation so as to justify using the basic query likelihood method as a retrieval function. In reality, however, this assumption does not hold because a user who dislikes a document would more likely avoid using words in the document when posing a query. This suggests that the standard query likelihood function is a potentially non-optimal retrieval function. In this paper, we attempt to improve the standard language modeling retrieval models by bringing back the component of negative query generation. Specifically, we propose a general and efficient approach to estimate document-dependent probabilities of negative query generation based on the principle of maximum entropy, and derive a more complete query likelihood retrieval function that also contains the negative query generation component. In addition, we further develop a more general probabilistic distance retrieval method to naturally incorporate query language models, which covers the proposed query likelihood with negative query generation as its special case. The proposed approaches not only bridge the theoretic gap between the standard language modeling retrieval models and the notion of relevance, but also improves the retrieval effectiveness with (almost) no additional computational cost.

Original languageEnglish (US)
Pages (from-to)359-378
Number of pages20
JournalInformation Retrieval
Volume18
Issue number4
DOIs
StatePublished - Aug 25 2015

Keywords

  • Language model
  • Negative query generation
  • Principle of maximum entropy
  • Probability ranking principle
  • Query likelihood
  • Relevance

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Negative query generation: bridging the gap between query likelihood retrieval models and relevance'. Together they form a unique fingerprint.

Cite this