Abstract
Language model information retrieval depends on accurate estimation of document models. In this paper, we propose a document expansion technique to deal with the problem of insufficient sampling of documents. We construct a probabilistic neighborhood for each document, and expand the document with its neighborhood information. The expanded document provides a more accurate estimation of the document model, thus improves retrieval accuracy. Moreover, since document expansion and pseudo feedback exploit different corpus structures, they can be combined to further improve performance. The experiment results on several different data sets demonstrate the effectiveness of the proposed document expansion method.
Original language | English (US) |
---|---|
Pages | 407-414 |
Number of pages | 8 |
DOIs | |
State | Published - 2006 |
Event | 2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006 - New York, NY, United States Duration: Jun 4 2006 → Jun 9 2006 |
Other
Other | 2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006 |
---|---|
Country/Territory | United States |
City | New York, NY |
Period | 6/4/06 → 6/9/06 |
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language