Liberal entity extraction: Rapid construction of fine-grained entity typing systems

Lifu Huang, Jonathan May, Xiaoman Pan, Heng Ji, Xiang Ren, Jiawei Han, Lin Zhao, James A. Hendler

Research output: Contribution to journalArticle

Abstract

The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

Original languageEnglish (US)
Pages (from-to)19-31
Number of pages13
JournalBig Data
Volume5
Issue number1
DOIs
StatePublished - Mar 2017

Keywords

  • Liberal Information Extraction
  • fine-grained entity typing
  • multi-level entity mention and representation
  • unsupervised learning

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Liberal entity extraction: Rapid construction of fine-grained entity typing systems'. Together they form a unique fingerprint.

  • Cite this