Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval

Chengxiang Zhai, William W. Cohen, John Lafferty

Research output: Contribution to journalConference articlepeer-review

Abstract

We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.

Original languageEnglish (US)
Pages (from-to)10-17
Number of pages8
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Issue numberSPEC. ISS.
StatePublished - 2003
EventProceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada
Duration: Jul 28 2003Aug 1 2003

Keywords

  • Language models
  • Maximal marginal relevance
  • Subtopic retrieval

ASJC Scopus subject areas

  • Management Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval'. Together they form a unique fingerprint.

Cite this