TY - GEN
T1 - Constructing and mining heterogeneous information networks from massive text
AU - Shang, Jingbo
AU - Shen, Jiaming
AU - Liu, Liyuan
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
PY - 2019/7/25
Y1 - 2019/7/25
N2 - Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.
AB - Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.
KW - Entity Recognition
KW - Massive Text Corpora
KW - Network Mining and Applications
KW - Phrase Mining
KW - Taxonomy Construction
UR - http://www.scopus.com/inward/record.url?scp=85071167059&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071167059&partnerID=8YFLogxK
U2 - 10.1145/3292500.3332275
DO - 10.1145/3292500.3332275
M3 - Conference contribution
AN - SCOPUS:85071167059
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 3191
EP - 3192
BT - KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
Y2 - 4 August 2019 through 8 August 2019
ER -