Abstract
Today's society is immersed in a wealth of text data, ranging from news articles, to social media, research literature, medical records, and corporate reports. A grand challenge of data science and engineering is to develop effective and scalable methods to extract structures and knowledge from massive text data to satisfy diverse applications, without extensive, corpus-specific human annotations. In this tutorial, we show that TextCube provides a critical information organization structure that will satisfy such an information need. We overview a set of recently developed data-driven methods that facilitate automated construction of TextCubes from massive, domain-specific text corpora, and show that TextCubes so constructed will enhance text exploration and analysis for various applications. We focus on new TextCube construction methods that are scalable, weakly-supervised, domain-independent, language-agnostic, and effective (i.e., generating quality TextCubes from large corpora of various domains). We will demonstrate with real datasets (including news articles, scientific publications, and product reviews) on how TextCubes can be constructed to assist multidimensional analysis of massive text corpora.
Original language | English (US) |
---|---|
Pages (from-to) | 1974-1977 |
Number of pages | 4 |
Journal | Proceedings of the VLDB Endowment |
Volume | 12 |
Issue number | 12 |
DOIs | |
State | Published - 2018 |
Event | 45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States Duration: Aug 26 2017 → Aug 30 2017 |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science