TY - GEN
T1 - Mining latent entity structures from massive unstructured and interconnected data
AU - Han, Jiawei
AU - Wang, Chi
PY - 2014
Y1 - 2014
N2 - The \big data" era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media and general Web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured but interconnected data. Mining latent structured information around entities uncovers sematic structures from massive unstructured data and hence enables many highimpact applications. In this tutorial, we summarize the closely related literature in database systems, data mining, Web, information extraction, information retrieval, and natural language processing, overview a spectrum of data-driven methods that extract and infer such latent structures, from an interdisciplinary point of view, and demonstrate how these structures support entity discovery and management, data understanding, and some new database applications. We present three categories of studies: mining conceptual, topical and relational structures. Moreover, we present case studies on real datasets, including research papers, news articles and social networks, and show how interesting and organized knowledge can be discovered by mining latent entity structures from these datasets.
AB - The \big data" era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media and general Web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured but interconnected data. Mining latent structured information around entities uncovers sematic structures from massive unstructured data and hence enables many highimpact applications. In this tutorial, we summarize the closely related literature in database systems, data mining, Web, information extraction, information retrieval, and natural language processing, overview a spectrum of data-driven methods that extract and infer such latent structures, from an interdisciplinary point of view, and demonstrate how these structures support entity discovery and management, data understanding, and some new database applications. We present three categories of studies: mining conceptual, topical and relational structures. Moreover, we present case studies on real datasets, including research papers, news articles and social networks, and show how interesting and organized knowledge can be discovered by mining latent entity structures from these datasets.
KW - Entity knowledge engineering
KW - Latent structure
UR - http://www.scopus.com/inward/record.url?scp=84904352389&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904352389&partnerID=8YFLogxK
U2 - 10.1145/2588555.2588890
DO - 10.1145/2588555.2588890
M3 - Conference contribution
AN - SCOPUS:84904352389
SN - 9781450323765
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1409
EP - 1410
BT - SIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
Y2 - 22 June 2014 through 27 June 2014
ER -