Automatic Entity Recognition and Typing in Massive Text Corpora

Xiang Ren, Ahmed El-Kishky, Chi Wang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In today's computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.

Original languageEnglish (US)
Title of host publicationWWW 2016 Companion - Proceedings of the 25th International Conference on World Wide Web
PublisherAssociation for Computing Machinery
Pages1025-1028
Number of pages4
ISBN (Electronic)9781450341448
DOIs
StatePublished - Apr 11 2016
Event25th International Conference on World Wide Web, WWW 2016 - Montreal, Canada
Duration: May 11 2016May 15 2016

Publication series

NameWWW 2016 Companion - Proceedings of the 25th International Conference on World Wide Web

Conference

Conference25th International Conference on World Wide Web, WWW 2016
Country/TerritoryCanada
CityMontreal
Period5/11/165/15/16

Keywords

  • entity recognition and typing
  • massive text corpora

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Automatic Entity Recognition and Typing in Massive Text Corpora'. Together they form a unique fingerprint.

Cite this