Constructing structured information networks from massive text corpora

Xiang Ren, Meng Jiang, Jingbo Shang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.

Original languageEnglish (US)
Title of host publication26th International World Wide Web Conference 2017, WWW 2017 Companion
PublisherInternational World Wide Web Conferences Steering Committee
Pages951-954
Number of pages4
ISBN (Electronic)9781450349147
DOIs
StatePublished - 2017
Event26th International World Wide Web Conference, WWW 2017 Companion - Perth, Australia
Duration: Apr 3 2017Apr 7 2017

Publication series

Name26th International World Wide Web Conference 2017, WWW 2017 Companion

Other

Other26th International World Wide Web Conference, WWW 2017 Companion
Country/TerritoryAustralia
CityPerth
Period4/3/174/7/17

Keywords

  • Attribute Discovery
  • Entity Recognition and Typing
  • Massive Text Corpora
  • Quality Phrase Mining
  • Relation Extraction

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Constructing structured information networks from massive text corpora'. Together they form a unique fingerprint.

Cite this