A probabilistic model for linking named entities in Web text with heterogeneous information networks

Wei Shen, Jiawei Han, Jianyong Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Heterogeneous information networks that consist of multitype, interconnected objects are becoming ubiquitous and increasingly popular, such as social media networks and bibliographic networks. The task to link named entity mentions detected from the unstructured Web text with their corresponding entities existing in a heterogeneous information network is of practical importance for the problem of information network population and enrichment. This task is challenging due to name ambiguity and limited knowledge existing in the information network. Most existing entity linking methods focus on linking entities with Wikipedia or Wikipedia-derived knowledge bases (e.g., YAGO), and are largely dependent on the special features associated with Wikipedia (e.g., Wikipedia articles or Wikipedia-based relatedness measures). Since heterogeneous information networks do not have such features, these previous methods cannot be applied to our task. In this paper, we propose SHINE, the first probabilistic model to link the named entitieS in Web text with a Heterogeneous Information NEtwork to the best of our knowledge. Our model consists of two components: the entity popularity model that captures the popularity of an entity, and the entity object model that captures the distribution of multi-type objects appearing in the textual context of an entity, which is generated using metapath constrained random walks over networks. As different meta-paths express diverse semantic meanings and lead to various distributions over objects, different paths have different weights in entity linking. We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data. Experimental results on a real world data set demonstrate the effectiveness and efficiency of our proposed model in comparison with the baselines.

Original languageEnglish (US)
Title of host publicationSIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1199-1210
Number of pages12
ISBN (Print)9781450323765
DOIs
StatePublished - 2014
Event2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014 - Snowbird, UT, United States
Duration: Jun 22 2014Jun 27 2014

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
CountryUnited States
CitySnowbird, UT
Period6/22/146/27/14

Keywords

  • Domainspecific entity linking
  • Entity linking
  • Heterogeneous information networks

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'A probabilistic model for linking named entities in Web text with heterogeneous information networks'. Together they form a unique fingerprint.

Cite this