TY - GEN
T1 - Learning search tasks in queries and web pages via graph regularization
AU - Ji, Ming
AU - Yan, Jun
AU - Gu, Siyu
AU - Han, Jiawei
AU - He, Xiaofei
AU - Zhang, Wei Vivian
AU - Chen, Zheng
PY - 2011
Y1 - 2011
N2 - As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically designed to help accomplish a certain task. Therefore, learning hidden tasks behind queries and web pages can help search engines return the most useful web pages to users by task matching. For instance, the search task that triggers query "thinkpad T410 broken" is to maintain a computer, and it is desirable for a search engine to return the Lenovo troubleshooting page on the top of the list. However, existing search engine technologies mainly focus on topic detection or relevance ranking, which are not able to predict the task that triggers a query and the task a web page can accomplish. In this paper, we propose to simultaneously classify queries and web pages into the popular search tasks by exploiting their content together with click-through logs. Specifically, we construct a task-oriented heterogeneous graph among queries and web pages. Each pair of objects in the graph are linked together as long as they potentially share similar search tasks. A novel graph-based regularization algorithm is designed for search task prediction by leveraging the graph. Extensive experiments in real search log data demonstrate the effectiveness of our method over state-of-the-art classifiers, and the search performance can be significantly improved by using the task prediction results as additional information.
AB - As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically designed to help accomplish a certain task. Therefore, learning hidden tasks behind queries and web pages can help search engines return the most useful web pages to users by task matching. For instance, the search task that triggers query "thinkpad T410 broken" is to maintain a computer, and it is desirable for a search engine to return the Lenovo troubleshooting page on the top of the list. However, existing search engine technologies mainly focus on topic detection or relevance ranking, which are not able to predict the task that triggers a query and the task a web page can accomplish. In this paper, we propose to simultaneously classify queries and web pages into the popular search tasks by exploiting their content together with click-through logs. Specifically, we construct a task-oriented heterogeneous graph among queries and web pages. Each pair of objects in the graph are linked together as long as they potentially share similar search tasks. A novel graph-based regularization algorithm is designed for search task prediction by leveraging the graph. Extensive experiments in real search log data demonstrate the effectiveness of our method over state-of-the-art classifiers, and the search performance can be significantly improved by using the task prediction results as additional information.
KW - Classification
KW - Graph regularization
KW - Web search task
UR - http://www.scopus.com/inward/record.url?scp=80052122769&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052122769&partnerID=8YFLogxK
U2 - 10.1145/2009916.2009928
DO - 10.1145/2009916.2009928
M3 - Conference contribution
AN - SCOPUS:80052122769
SN - 9781450309349
T3 - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 55
EP - 64
BT - SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery
T2 - 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011
Y2 - 24 July 2011 through 28 July 2011
ER -