TY - GEN
T1 - Mining evidences for named entity disambiguation
AU - Li, Yang
AU - Wang, Chi
AU - Han, Fangqiu
AU - Han, Jiawei
AU - Roth, Dan
AU - Yan, Xifeng
N1 - Funding Information:
This research was sponsored in part by the Army Research Laboratory under cooperative agreements W911NF-09-2-0053, NSF IIS-0954125, and DARPA under agreement number FA8750-13-2-0008. Chi Wang was supported by Microsoft Research PhD Fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory, DARPA or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notice herein.
Publisher Copyright:
Copyright © 2013 ACM.
PY - 2013/8/11
Y1 - 2013/8/11
N2 - Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-The-Art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.
AB - Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-The-Art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.
KW - Entity disambiguation
KW - Evidencemining
KW - Generativemodel
KW - Knowledge expansion
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85006147542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006147542&partnerID=8YFLogxK
U2 - 10.1145/2487575.2487681
DO - 10.1145/2487575.2487681
M3 - Conference contribution
AN - SCOPUS:85006147542
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1070
EP - 1078
BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
A2 - Parekh, Rajesh
A2 - He, Jingrui
A2 - Inderjit, Dhillon S.
A2 - Bradley, Paul
A2 - Koren, Yehuda
A2 - Ghani, Rayid
A2 - Senator, Ted E.
A2 - Grossman, Robert L.
A2 - Uthurusamy, Ramasamy
PB - Association for Computing Machinery
T2 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
Y2 - 11 August 2013 through 14 August 2013
ER -