Mining evidences for named entity disambiguation

Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, Dan Roth, Xifeng Yan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-The-Art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.

Original languageEnglish (US)
Title of host publicationKDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsRajesh Parekh, Jingrui He, Dhillon S. Inderjit, Paul Bradley, Yehuda Koren, Rayid Ghani, Ted E. Senator, Robert L. Grossman, Ramasamy Uthurusamy
PublisherAssociation for Computing Machinery
Pages1070-1078
Number of pages9
ISBN (Electronic)9781450321747
DOIs
StatePublished - Aug 11 2013
Event19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, United States
Duration: Aug 11 2013Aug 14 2013

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F128815

Other

Other19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
Country/TerritoryUnited States
CityChicago
Period8/11/138/14/13

Keywords

  • Entity disambiguation
  • Evidencemining
  • Generativemodel
  • Knowledge expansion
  • Semi-supervised learning

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this