Abstract
Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through "filling in" missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.
Original language | English (US) |
---|---|
Pages (from-to) | 139-164 |
Number of pages | 26 |
Journal | Information Retrieval |
Volume | 11 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2008 |
Keywords
- Language models
- Probabilistic propagation
- Smoothing
- Term count propagation
ASJC Scopus subject areas
- Information Systems
- Library and Information Sciences