A risk minimization framework for information retrieval

Chengxiang Zhai, John Lafferty

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents.

Original languageEnglish (US)
Pages (from-to)31-55
Number of pages25
JournalInformation Processing and Management
Volume42
Issue number1 SPEC. ISS
DOIs
StatePublished - Jan 2006

Keywords

  • Bayesian decision theory
  • Retrieval models
  • Risk minimization
  • Statistical language models

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'A risk minimization framework for information retrieval'. Together they form a unique fingerprint.

Cite this