Bringing structure to text: Mining phrases, entities, topics, and hierarchies

Jiawei Han, Chi Wang, Ahmed El-Kishky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mining phrases, entity concepts, topics, and hierarchies from massive text corpus is an essential problem in the age of big data. Text data in electronic forms are ubiquitous, ranging from scientific articles to social networks, enterprise logs, news articles, social media and general web pages. It is highly desirable but challenging to bring structure to unstructured text data, uncover underlying hierarchies, relationships, patterns and trends, and gain knowledge from such data. In this tutorial, we provide a comprehensive survey on the state-of-the art of data-driven methods that automatically mine phrases, extract and infer latent structures from text corpus, and construct multi-granularity topical groupings and hierarchies of the underlying themes. We study their principles, methodologies, algorithms and applications using several real datasets including research papers and news articles and demonstrate how these methods work and how the uncovered latent entity structures may help text understanding, knowledge discovery and management.

Original languageEnglish (US)
Title of host publicationKDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1968
Number of pages1
ISBN (Print)9781450329569
DOIs
StatePublished - 2014
Event20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States
Duration: Aug 24 2014Aug 27 2014

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
Country/TerritoryUnited States
CityNew York, NY
Period8/24/148/27/14

Keywords

  • information networks
  • phrase mining
  • text mining
  • topic model

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Bringing structure to text: Mining phrases, entities, topics, and hierarchies'. Together they form a unique fingerprint.

Cite this