TY - JOUR
T1 - Smoothing document language models with probabilistic term count propagation
AU - Shakery, Azadeh
AU - Zhai, Chengxiang
N1 - Funding Information:
Acknowledgments We are grateful to the anonymous reviewers for their constructive comments. This work is supported in part by the National Science Foundation under Grant Numbers 0425852 and 0347933.
PY - 2008/4
Y1 - 2008/4
N2 - Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through "filling in" missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.
AB - Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through "filling in" missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.
KW - Language models
KW - Probabilistic propagation
KW - Smoothing
KW - Term count propagation
UR - http://www.scopus.com/inward/record.url?scp=40549087663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=40549087663&partnerID=8YFLogxK
U2 - 10.1007/s10791-007-9041-9
DO - 10.1007/s10791-007-9041-9
M3 - Article
AN - SCOPUS:40549087663
SN - 1386-4564
VL - 11
SP - 139
EP - 164
JO - Information Retrieval
JF - Information Retrieval
IS - 2
ER -