Generative model-based metasearch for data fusion in information retrieval

Miles Efron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

"Data fusion" refers to the problem in information retrieval (IR) where several lists of documents ranked against a query are to be merged into a single ranked list for presentation to a user. Data fusion is also known as "metasearch." In a digital library setting data fusion may support operations such as federated search based on multiple repository representations. This paper presents a novel approach to the fusion problem: generative model-based Metasearch (GeM). We suggest viewing the appearance of documents in a return set as the outcome of a probabilistic process; some documents are likely to occur in the model, while others are unlikely. Using Bayesian parameter estimation to fit a multinomial distribution based on the return sets to be merged, GeM achieves a final ranking by listing documents in decreasing probability of generation under the induced model. We also introduce what we call "the impatient reader" approach to normalizing document ranks in service to the fusion operation. We report results from several experiments on TREC data suggesting that GeM, informed with impatient reader document scores, operates at state-of-the-art levels of effectiveness.

Original languageEnglish (US)
Title of host publicationJCDL'09 - Proceedings of the 2009 ACM/IEEE Joint Conference on Digital Libraries
Pages153-162
Number of pages10
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09 - Austin, TX, United States
Duration: Jun 15 2009Jun 19 2009

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09
Country/TerritoryUnited States
CityAustin, TX
Period6/15/096/19/09

Keywords

  • DataFusion
  • Digital libraries
  • Generative models
  • Information retrieval
  • Metasearch
  • Probabilistic models

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Generative model-based metasearch for data fusion in information retrieval'. Together they form a unique fingerprint.

Cite this