Smoothing document language models with probabilistic term count propagation

Azadeh Shakery, Chengxiang Zhai

Research output: Contribution to journalArticlepeer-review

Abstract

Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through "filling in" missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.

Original languageEnglish (US)
Pages (from-to)139-164
Number of pages26
JournalInformation Retrieval
Volume11
Issue number2
DOIs
StatePublished - Apr 2008

Keywords

  • Language models
  • Probabilistic propagation
  • Smoothing
  • Term count propagation

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Smoothing document language models with probabilistic term count propagation'. Together they form a unique fingerprint.

Cite this