TY - GEN
T1 - A hierarchical Dirichlet model for taxonomy expansion for search engines
AU - Wang, Jingjing
AU - Kang, Changsung
AU - Chang, Yi
AU - Han, Jiawei
PY - 2014/4/7
Y1 - 2014/4/7
N2 - Emerging trends and products pose a challenge to modern search engines since they must adapt to the constantly changing needs and interests of users. For example, vertical search engines, such as Amazon, eBay, Walmart, Yelp and Yahoo! Local, provide business category hierarchies for people to navigate through millions of business listings. The category information also provides important ranking features that can be used to improve search experience. However, category hierarchies are often manually crafted by some human experts and they are far from complete. Manually constructed category hierarchies cannot handle the everchanging and sometimes long-tail user information needs. In this paper, we study the problem of how to expand an existing category hierarchy for a search/navigation system to accommodate the information needs of users more comprehensively. We propose a general framework for this task, which has three steps: 1) detecting meaningful missing categories; 2) modeling the category hierarchy using a hierarchical Dirichlet model and predicting the optimal tree structure according to the model; 3) reorganizing the corpus using the complete category structure, i.e., associating each webpage with the relevant categories from the complete category hierarchy. Experimental results demonstrate that our proposed framework generates a high-quality category hierarchy and significantly boosts the retrieval performance. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
AB - Emerging trends and products pose a challenge to modern search engines since they must adapt to the constantly changing needs and interests of users. For example, vertical search engines, such as Amazon, eBay, Walmart, Yelp and Yahoo! Local, provide business category hierarchies for people to navigate through millions of business listings. The category information also provides important ranking features that can be used to improve search experience. However, category hierarchies are often manually crafted by some human experts and they are far from complete. Manually constructed category hierarchies cannot handle the everchanging and sometimes long-tail user information needs. In this paper, we study the problem of how to expand an existing category hierarchy for a search/navigation system to accommodate the information needs of users more comprehensively. We propose a general framework for this task, which has three steps: 1) detecting meaningful missing categories; 2) modeling the category hierarchy using a hierarchical Dirichlet model and predicting the optimal tree structure according to the model; 3) reorganizing the corpus using the complete category structure, i.e., associating each webpage with the relevant categories from the complete category hierarchy. Experimental results demonstrate that our proposed framework generates a high-quality category hierarchy and significantly boosts the retrieval performance. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
KW - Dirichlet distribution
KW - Local search
KW - Missing categories
KW - Taxonomy expansion
UR - http://www.scopus.com/inward/record.url?scp=84909633605&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84909633605&partnerID=8YFLogxK
U2 - 10.1145/2566486.2568037
DO - 10.1145/2566486.2568037
M3 - Conference contribution
AN - SCOPUS:84909633605
T3 - WWW 2014 - Proceedings of the 23rd International Conference on World Wide Web
SP - 961
EP - 970
BT - WWW 2014 - Proceedings of the 23rd International Conference on World Wide Web
PB - Association for Computing Machinery
T2 - 23rd International Conference on World Wide Web, WWW 2014
Y2 - 7 April 2014 through 11 April 2014
ER -