TextCube: Automated construction and multidimensional exploration

Yu Meng, Jiaxin Huang, Jingbo Shang, Jiawei Han

Research output: Contribution to journalConference articlepeer-review

Abstract

Today's society is immersed in a wealth of text data, ranging from news articles, to social media, research literature, medical records, and corporate reports. A grand challenge of data science and engineering is to develop effective and scalable methods to extract structures and knowledge from massive text data to satisfy diverse applications, without extensive, corpus-specific human annotations. In this tutorial, we show that TextCube provides a critical information organization structure that will satisfy such an information need. We overview a set of recently developed data-driven methods that facilitate automated construction of TextCubes from massive, domain-specific text corpora, and show that TextCubes so constructed will enhance text exploration and analysis for various applications. We focus on new TextCube construction methods that are scalable, weakly-supervised, domain-independent, language-agnostic, and effective (i.e., generating quality TextCubes from large corpora of various domains). We will demonstrate with real datasets (including news articles, scientific publications, and product reviews) on how TextCubes can be constructed to assist multidimensional analysis of massive text corpora.

Original languageEnglish (US)
Pages (from-to)1974-1977
Number of pages4
JournalProceedings of the VLDB Endowment
Volume12
Issue number12
DOIs
StatePublished - 2018
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: Aug 26 2017Aug 30 2017

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'TextCube: Automated construction and multidimensional exploration'. Together they form a unique fingerprint.

Cite this