TY - GEN
T1 - Constructing structured information networks from massive text corpora
AU - Ren, Xiang
AU - Jiang, Meng
AU - Shang, Jingbo
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License.
PY - 2017
Y1 - 2017
N2 - In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.
AB - In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.
KW - Attribute Discovery
KW - Entity Recognition and Typing
KW - Massive Text Corpora
KW - Quality Phrase Mining
KW - Relation Extraction
UR - http://www.scopus.com/inward/record.url?scp=85051486166&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051486166&partnerID=8YFLogxK
U2 - 10.1145/3041021.3051107
DO - 10.1145/3041021.3051107
M3 - Conference contribution
AN - SCOPUS:85051486166
T3 - 26th International World Wide Web Conference 2017, WWW 2017 Companion
SP - 951
EP - 954
BT - 26th International World Wide Web Conference 2017, WWW 2017 Companion
PB - International World Wide Web Conferences Steering Committee
T2 - 26th International World Wide Web Conference, WWW 2017 Companion
Y2 - 3 April 2017 through 7 April 2017
ER -