Constructing and mining heterogeneous information networks from massive text

Jingbo Shang, Jiaming Shen, Liyuan Liu, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.

Original languageEnglish (US)
Title of host publicationKDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages3191-3192
Number of pages2
ISBN (Electronic)9781450362016
DOIs
StatePublished - Jul 25 2019
Event25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019 - Anchorage, United States
Duration: Aug 4 2019Aug 8 2019

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
CountryUnited States
CityAnchorage
Period8/4/198/8/19

Fingerprint

Data mining

Keywords

  • Entity Recognition
  • Massive Text Corpora
  • Network Mining and Applications
  • Phrase Mining
  • Taxonomy Construction

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Shang, J., Shen, J., Liu, L., & Han, J. (2019). Constructing and mining heterogeneous information networks from massive text. In KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3191-3192). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/3292500.3332275

Constructing and mining heterogeneous information networks from massive text. / Shang, Jingbo; Shen, Jiaming; Liu, Liyuan; Han, Jiawei.

KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2019. p. 3191-3192 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shang, J, Shen, J, Liu, L & Han, J 2019, Constructing and mining heterogeneous information networks from massive text. in KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 3191-3192, 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019, Anchorage, United States, 8/4/19. https://doi.org/10.1145/3292500.3332275
Shang J, Shen J, Liu L, Han J. Constructing and mining heterogeneous information networks from massive text. In KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2019. p. 3191-3192. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/3292500.3332275
Shang, Jingbo ; Shen, Jiaming ; Liu, Liyuan ; Han, Jiawei. / Constructing and mining heterogeneous information networks from massive text. KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2019. pp. 3191-3192 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{341c24d5c54b4998b1f0fcbaa8901e92,
title = "Constructing and mining heterogeneous information networks from massive text",
abstract = "Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.",
keywords = "Entity Recognition, Massive Text Corpora, Network Mining and Applications, Phrase Mining, Taxonomy Construction",
author = "Jingbo Shang and Jiaming Shen and Liyuan Liu and Jiawei Han",
year = "2019",
month = "7",
day = "25",
doi = "10.1145/3292500.3332275",
language = "English (US)",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "3191--3192",
booktitle = "KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Constructing and mining heterogeneous information networks from massive text

AU - Shang, Jingbo

AU - Shen, Jiaming

AU - Liu, Liyuan

AU - Han, Jiawei

PY - 2019/7/25

Y1 - 2019/7/25

N2 - Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.

AB - Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user's need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user's need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.

KW - Entity Recognition

KW - Massive Text Corpora

KW - Network Mining and Applications

KW - Phrase Mining

KW - Taxonomy Construction

UR - http://www.scopus.com/inward/record.url?scp=85071167059&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071167059&partnerID=8YFLogxK

U2 - 10.1145/3292500.3332275

DO - 10.1145/3292500.3332275

M3 - Conference contribution

AN - SCOPUS:85071167059

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 3191

EP - 3192

BT - KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -