Mining Structures from Massive Text Data: A Data-Driven Approach

Research output: Contribution to journalConference article

Abstract

The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to mine structures from such massive unstructured data, and transform such big data into structured networks and actionable knowledge. We propose a text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show that quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entity-attribute-value triples can be extracted from meta-patterns discovered from such data. Finally, we propose a data-to-network-to-knowledge paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.

Original languageEnglish (US)
Pages (from-to)16-19
Number of pages4
JournalCEUR Workshop Proceedings
Volume2029
StatePublished - Jan 1 2017
Event4th Annual International Symposium on Information Management and Big Data, SIMBig 2017 - Lima, Peru
Duration: Sep 4 2017Sep 6 2017

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Mining Structures from Massive Text Data: A Data-Driven Approach'. Together they form a unique fingerprint.

  • Cite this