Model-averaged latent semantic indexing

Miles Efron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This poster introduces a novel approach to information retrieval that uses statistical model averaging to improve latent semantic indexing (LSI). Instead of choosing a single dimensionality $k$ for LSI , we propose using several models of differing dimensionality to inform retrieval. To manage this ensemble we weight each model's contribution to an extent inversely proportional to its AIC (Akaike information criterion). Thus each model contributes proportionally to its expected Kullback-Leibler divergence from the distribution that generated the data. We present results on three standard IR test collections, demonstrating significant improvement over both the traditional vector space model and single-model LSI.

Original languageEnglish (US)
Title of host publicationProceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
Pages755-756
Number of pages2
DOIs
StatePublished - 2007
Externally publishedYes
Event30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 - Amsterdam, Netherlands
Duration: Jul 23 2007Jul 27 2007

Publication series

NameProceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07

Other

Other30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
Country/TerritoryNetherlands
CityAmsterdam
Period7/23/077/27/07

Keywords

  • Latent semantic indexing
  • Model averaging
  • Model selection

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Model-averaged latent semantic indexing'. Together they form a unique fingerprint.

Cite this