Unsupervised query segmentation using clickthrough for information retrieval

Yanen Li, Bo June Hsu, Cheng Xiang Zhai, Kuansan Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Query segmentation is an important task toward understanding queries accurately, which is essential for improving search results. Existing segmentation models either use labeled data to predict the segmentation boundaries, for which the training data is expensive to collect, or employ unsupervised strategy based on a large text corpus, which might be inaccurate because of the lack of relevant information. In this paper, we propose a probabilistic model to exploit clickthrough data for query segmentation, where the model parameters are estimated via an efficient EM algorithm. We further study how to properly interpret the segmentation results and utilize them to improve retrieval accuracy. Specifically, we propose an integrated language model based on the standard bigram language model to exploit the probabilistic structure obtained through query segmentation. Experiment results on two datasets show that our segmentation model outperforms existing segmentation models. Furthermore, extensive experiments on a large retrieval dataset reveals that the results of query segmentation can be leveraged to improve retrieval relevance by using the proposed integrated language model.

Original languageEnglish (US)
Title of host publicationSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages285-294
Number of pages10
ISBN (Print)9781450309349
DOIs
StatePublished - 2011
Event34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China
Duration: Jul 24 2011Jul 28 2011

Publication series

NameSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011
Country/TerritoryChina
CityBeijing
Period7/24/117/28/11

Keywords

  • Expectation maximization algorithm
  • Language modeling
  • QSLM
  • Query segmentation

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Unsupervised query segmentation using clickthrough for information retrieval'. Together they form a unique fingerprint.

Cite this