From Unstructured Text to TextCube: Automated Construction and Multidimensional Exploration

Research output: Contribution to conferencePaperpeer-review

Abstract

The real-world big data are largely unstructured, interconnected, and dynamic, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data, which may not be scalable, especially considering that a lot of text corpora are highly dynamic and domain specific. We believe that massive text data itself may disclose a large body of hidden patterns, structures, and knowledge. With domain-independent and domain-dependent knowledge bases, we propose to explore the power of massive data itself for turning unstructured data into structured knowledge. By organizing massive text documents into multidimensional text cubes, we show structured knowledge can be extracted and used effectively. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including mining quality phrases, entity recognition and typing, multi-faceted taxonomy construction, and construction and exploration of multidimensional text cubes. We show that data-driven approach could be a promising direction at transforming massive text data into structured knowledge.

Original languageEnglish (US)
Pages5-6
Number of pages2
DOIs
StatePublished - 2019
Event28th ACM International Conference on Information and Knowledge Management, CIKM 2019 - Beijing, China
Duration: Nov 3 2019Nov 7 2019

Conference

Conference28th ACM International Conference on Information and Knowledge Management, CIKM 2019
Country/TerritoryChina
CityBeijing
Period11/3/1911/7/19

Keywords

  • Data mining
  • text embedding
  • text mining
  • textcube construction

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint

Dive into the research topics of 'From Unstructured Text to TextCube: Automated Construction and Multidimensional Exploration'. Together they form a unique fingerprint.

Cite this