Mining Structures from Massive Text Data: A Data-Driven Approach

Research output: Contribution to journalConference article

Abstract

The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to mine structures from such massive unstructured data, and transform such big data into structured networks and actionable knowledge. We propose a text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show that quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entity-attribute-value triples can be extracted from meta-patterns discovered from such data. Finally, we propose a data-to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.

Original languageEnglish (US)
Pages (from-to)16-19
Number of pages4
JournalCEUR Workshop Proceedings
Volume2029
StatePublished - Jan 1 2017
Event4th Annual International Symposium on Information Management and Big Data, SIMBig 2017 - Lima, Peru
Duration: Sep 4 2017Sep 6 2017

Fingerprint

Big data

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Mining Structures from Massive Text Data : A Data-Driven Approach. / Han, Jiawei.

In: CEUR Workshop Proceedings, Vol. 2029, 01.01.2017, p. 16-19.

Research output: Contribution to journalConference article

@article{c9b7efd31e4d473a97a3fe7e77d9abc2,
title = "Mining Structures from Massive Text Data: A Data-Driven Approach",
abstract = "The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to mine structures from such massive unstructured data, and transform such big data into structured networks and actionable knowledge. We propose a text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show that quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entity-attribute-value triples can be extracted from meta-patterns discovered from such data. Finally, we propose a data-to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.",
author = "Jiawei Han",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
volume = "2029",
pages = "16--19",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - Mining Structures from Massive Text Data

T2 - A Data-Driven Approach

AU - Han, Jiawei

PY - 2017/1/1

Y1 - 2017/1/1

N2 - The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to mine structures from such massive unstructured data, and transform such big data into structured networks and actionable knowledge. We propose a text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show that quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entity-attribute-value triples can be extracted from meta-patterns discovered from such data. Finally, we propose a data-to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.

AB - The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to mine structures from such massive unstructured data, and transform such big data into structured networks and actionable knowledge. We propose a text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show that quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entity-attribute-value triples can be extracted from meta-patterns discovered from such data. Finally, we propose a data-to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.

UR - http://www.scopus.com/inward/record.url?scp=85040570493&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040570493&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85040570493

VL - 2029

SP - 16

EP - 19

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -