TY - GEN
T1 - Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts
AU - Ren, Xiang
AU - Wang, Yujing
AU - Yu, Xiao
AU - Yan, Jun
AU - Chen, Zheng
AU - Han, Jiawei
PY - 2014
Y1 - 2014
N2 - The problem of learning user search intents has attracted intensive attention from both industry and academia. However, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of data source. For example, query text has difficulty in distinguishing ambiguous queries; search log is bias to the order of search results and users' noisy click behaviors. In this work, we for the first time leverage three types of objects, namely queries, web pages and Wikipedia concepts collaboratively for learning generic search intents and construct a heterogeneous graph to represent multiple types of relationships between them. A novel unsupervised method called heterogeneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed heterogeneous graph. With the proposed co-clustering method, one can enhance the quality of intent understanding by taking advantage of different types of data, which complement each other, and make the implicit intents easier to interpret with explicit knowledge from Wikipedia concepts. Experiments on two real-world datasets demonstrate the power of the proposed method where it achieves a 9.25% improvement in terms of NDCG on search ranking task and a 4.67% enhancement in terms of Rand index on object co-clustering task compared to the best state-of-the-art method.
AB - The problem of learning user search intents has attracted intensive attention from both industry and academia. However, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of data source. For example, query text has difficulty in distinguishing ambiguous queries; search log is bias to the order of search results and users' noisy click behaviors. In this work, we for the first time leverage three types of objects, namely queries, web pages and Wikipedia concepts collaboratively for learning generic search intents and construct a heterogeneous graph to represent multiple types of relationships between them. A novel unsupervised method called heterogeneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed heterogeneous graph. With the proposed co-clustering method, one can enhance the quality of intent understanding by taking advantage of different types of data, which complement each other, and make the implicit intents easier to interpret with explicit knowledge from Wikipedia concepts. Experiments on two real-world datasets demonstrate the power of the proposed method where it achieves a 9.25% improvement in terms of NDCG on search ranking task and a 4.67% enhancement in terms of Rand index on object co-clustering task compared to the best state-of-the-art method.
KW - heterogeneous graph clustering
KW - search intent
KW - wikipedia
UR - http://www.scopus.com/inward/record.url?scp=84906852428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906852428&partnerID=8YFLogxK
U2 - 10.1145/2556195.2556222
DO - 10.1145/2556195.2556222
M3 - Conference contribution
AN - SCOPUS:84906852428
SN - 9781450323512
T3 - WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining
SP - 23
EP - 32
BT - WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery
T2 - 7th ACM International Conference on Web Search and Data Mining, WSDM 2014
Y2 - 24 February 2014 through 28 February 2014
ER -