TY - GEN
T1 - Automatic Entity Recognition and Typing in Massive Text Corpora
AU - Ren, Xiang
AU - El-Kishky, Ahmed
AU - Wang, Chi
AU - Han, Jiawei
N1 - Funding Information:
This study was supported by the King's College London Confidence in Concept award from the Medical Research Council (MRC) ( MC_PC_16048 ) to PF-P. DO is supported by the UK Medical Research Council (MR/N013700/1) and King's College London member of the MRC Doctoral Training Partnership in Biomedical Sciences. MB, HB, RS and RD are part-funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at the South London and Maudsley NHS Foundation Trust and King's College London. RP has received support from an MRC Health Data Research UK Fellowship (MR/S003118/1) and a Starter Grant for Clinical Lecturers (SGL015/1020) supported by the Academy of Medical Sciences, The Wellcome Trust, MRC, British Heart Foundation, Arthritis Research UK, the Royal College of Physicians and Diabetes UK.
Publisher Copyright:
© 2016 owner/author(s).
PY - 2016/4/11
Y1 - 2016/4/11
N2 - In today's computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.
AB - In today's computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.
KW - entity recognition and typing
KW - massive text corpora
UR - http://www.scopus.com/inward/record.url?scp=85047801459&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047801459&partnerID=8YFLogxK
U2 - 10.1145/2872518.2891065
DO - 10.1145/2872518.2891065
M3 - Conference contribution
AN - SCOPUS:85047801459
T3 - WWW 2016 Companion - Proceedings of the 25th International Conference on World Wide Web
SP - 1025
EP - 1028
BT - WWW 2016 Companion - Proceedings of the 25th International Conference on World Wide Web
PB - Association for Computing Machinery, Inc
T2 - 25th International Conference on World Wide Web, WWW 2016
Y2 - 11 May 2016 through 15 May 2016
ER -