Mining latent entity structures from massive unstructured and interconnected data

Jiawei Han, Chi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The \big data" era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media and general Web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured but interconnected data. Mining latent structured information around entities uncovers sematic structures from massive unstructured data and hence enables many highimpact applications. In this tutorial, we summarize the closely related literature in database systems, data mining, Web, information extraction, information retrieval, and natural language processing, overview a spectrum of data-driven methods that extract and infer such latent structures, from an interdisciplinary point of view, and demonstrate how these structures support entity discovery and management, data understanding, and some new database applications. We present three categories of studies: mining conceptual, topical and relational structures. Moreover, we present case studies on real datasets, including research papers, news articles and social networks, and show how interesting and organized knowledge can be discovered by mining latent entity structures from these datasets.

Original languageEnglish (US)
Title of host publicationSIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1409-1410
Number of pages2
ISBN (Print)9781450323765
DOIs
StatePublished - 2014
Event2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014 - Snowbird, UT, United States
Duration: Jun 22 2014Jun 27 2014

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
CountryUnited States
CitySnowbird, UT
Period6/22/146/27/14

Keywords

  • Entity knowledge engineering
  • Latent structure

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Mining latent entity structures from massive unstructured and interconnected data'. Together they form a unique fingerprint.

Cite this