Constructing structured information networks from massive text corpora

Xiang Ren, Meng Jiang, Jingbo Shang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.

Original languageEnglish (US)
Title of host publication26th International World Wide Web Conference 2017, WWW 2017 Companion
PublisherInternational World Wide Web Conferences Steering Committee
Pages951-954
Number of pages4
ISBN (Electronic)9781450349147
DOIs
StatePublished - Jan 1 2019
Event26th International World Wide Web Conference, WWW 2017 Companion - Perth, Australia
Duration: Apr 3 2017Apr 7 2017

Publication series

Name26th International World Wide Web Conference 2017, WWW 2017 Companion

Other

Other26th International World Wide Web Conference, WWW 2017 Companion
CountryAustralia
CityPerth
Period4/3/174/7/17

Fingerprint

Data mining

Keywords

  • Attribute Discovery
  • Entity Recognition and Typing
  • Massive Text Corpora
  • Quality Phrase Mining
  • Relation Extraction

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Cite this

Ren, X., Jiang, M., Shang, J., & Han, J. (2019). Constructing structured information networks from massive text corpora. In 26th International World Wide Web Conference 2017, WWW 2017 Companion (pp. 951-954). (26th International World Wide Web Conference 2017, WWW 2017 Companion). International World Wide Web Conferences Steering Committee. https://doi.org/10.1145/3041021.3051107

Constructing structured information networks from massive text corpora. / Ren, Xiang; Jiang, Meng; Shang, Jingbo; Han, Jiawei.

26th International World Wide Web Conference 2017, WWW 2017 Companion. International World Wide Web Conferences Steering Committee, 2019. p. 951-954 (26th International World Wide Web Conference 2017, WWW 2017 Companion).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ren, X, Jiang, M, Shang, J & Han, J 2019, Constructing structured information networks from massive text corpora. in 26th International World Wide Web Conference 2017, WWW 2017 Companion. 26th International World Wide Web Conference 2017, WWW 2017 Companion, International World Wide Web Conferences Steering Committee, pp. 951-954, 26th International World Wide Web Conference, WWW 2017 Companion, Perth, Australia, 4/3/17. https://doi.org/10.1145/3041021.3051107
Ren X, Jiang M, Shang J, Han J. Constructing structured information networks from massive text corpora. In 26th International World Wide Web Conference 2017, WWW 2017 Companion. International World Wide Web Conferences Steering Committee. 2019. p. 951-954. (26th International World Wide Web Conference 2017, WWW 2017 Companion). https://doi.org/10.1145/3041021.3051107
Ren, Xiang ; Jiang, Meng ; Shang, Jingbo ; Han, Jiawei. / Constructing structured information networks from massive text corpora. 26th International World Wide Web Conference 2017, WWW 2017 Companion. International World Wide Web Conferences Steering Committee, 2019. pp. 951-954 (26th International World Wide Web Conference 2017, WWW 2017 Companion).
@inproceedings{8936967e0c2c4351874d08a66b5e9092,
title = "Constructing structured information networks from massive text corpora",
abstract = "In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.",
keywords = "Attribute Discovery, Entity Recognition and Typing, Massive Text Corpora, Quality Phrase Mining, Relation Extraction",
author = "Xiang Ren and Meng Jiang and Jingbo Shang and Jiawei Han",
year = "2019",
month = "1",
day = "1",
doi = "10.1145/3041021.3051107",
language = "English (US)",
series = "26th International World Wide Web Conference 2017, WWW 2017 Companion",
publisher = "International World Wide Web Conferences Steering Committee",
pages = "951--954",
booktitle = "26th International World Wide Web Conference 2017, WWW 2017 Companion",

}

TY - GEN

T1 - Constructing structured information networks from massive text corpora

AU - Ren, Xiang

AU - Jiang, Meng

AU - Shang, Jingbo

AU - Han, Jiawei

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.

AB - In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.

KW - Attribute Discovery

KW - Entity Recognition and Typing

KW - Massive Text Corpora

KW - Quality Phrase Mining

KW - Relation Extraction

UR - http://www.scopus.com/inward/record.url?scp=85051486166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051486166&partnerID=8YFLogxK

U2 - 10.1145/3041021.3051107

DO - 10.1145/3041021.3051107

M3 - Conference contribution

AN - SCOPUS:85051486166

T3 - 26th International World Wide Web Conference 2017, WWW 2017 Companion

SP - 951

EP - 954

BT - 26th International World Wide Web Conference 2017, WWW 2017 Companion

PB - International World Wide Web Conferences Steering Committee

ER -