Dirichlet aspect weighting: A generalized EM algorithm for integrating external data fields with semantically structured queries by using gradient projection method

Atulya Velivelli, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we address the problem of document retrieval with semantically structured queries - queries where each term has a tagged field label. We introduce Dirichlet Aspect Weighting model which integrates terms from external databases into the query language model in a bayesian learning framework. For this model, the dirichlet prior distribution is governed by parameters which depend on the number of fields in the external databases. This model needs additional examples to be augmented to the semantically structured query. These examples are obtained using pseudo relevance feedback. We formulate a loglikelihood function for the Dirichlet Aspect Weighting model and maximize it using a novel Generalized EM algorithm. Comparison of the results of Dirichlet Aspect Weighting model on TREC 2005 Genomics Track dataset with baseline methods using pseudo relevance feedback, while incorporating terms from external databases shows an improvement.

Original languageEnglish (US)
Title of host publicationProceedings - Sixth International Conference on Data Mining, ICDM 2006
Pages633-644
Number of pages12
DOIs
StatePublished - 2006
Externally publishedYes
Event6th International Conference on Data Mining, ICDM 2006 - Hong Kong, China
Duration: Dec 18 2006Dec 22 2006

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other6th International Conference on Data Mining, ICDM 2006
Country/TerritoryChina
CityHong Kong
Period12/18/0612/22/06

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Dirichlet aspect weighting: A generalized EM algorithm for integrating external data fields with semantically structured queries by using gradient projection method'. Together they form a unique fingerprint.

Cite this