A hierarchical Dirichlet model for taxonomy expansion for search engines

Jingjing Wang, Changsung Kang, Yi Chang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Emerging trends and products pose a challenge to modern search engines since they must adapt to the constantly changing needs and interests of users. For example, vertical search engines, such as Amazon, eBay, Walmart, Yelp and Yahoo! Local, provide business category hierarchies for people to navigate through millions of business listings. The category information also provides important ranking features that can be used to improve search experience. However, category hierarchies are often manually crafted by some human experts and they are far from complete. Manually constructed category hierarchies cannot handle the everchanging and sometimes long-tail user information needs. In this paper, we study the problem of how to expand an existing category hierarchy for a search/navigation system to accommodate the information needs of users more comprehensively. We propose a general framework for this task, which has three steps: 1) detecting meaningful missing categories; 2) modeling the category hierarchy using a hierarchical Dirichlet model and predicting the optimal tree structure according to the model; 3) reorganizing the corpus using the complete category structure, i.e., associating each webpage with the relevant categories from the complete category hierarchy. Experimental results demonstrate that our proposed framework generates a high-quality category hierarchy and significantly boosts the retrieval performance. Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Original languageEnglish (US)
Title of host publicationWWW 2014 - Proceedings of the 23rd International Conference on World Wide Web
PublisherAssociation for Computing Machinery
Pages961-970
Number of pages10
ISBN (Electronic)9781450327442
DOIs
StatePublished - Apr 7 2014
Event23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of
Duration: Apr 7 2014Apr 11 2014

Publication series

NameWWW 2014 - Proceedings of the 23rd International Conference on World Wide Web

Other

Other23rd International Conference on World Wide Web, WWW 2014
Country/TerritoryKorea, Republic of
CitySeoul
Period4/7/144/11/14

Keywords

  • Dirichlet distribution
  • Local search
  • Missing categories
  • Taxonomy expansion

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'A hierarchical Dirichlet model for taxonomy expansion for search engines'. Together they form a unique fingerprint.

Cite this