Open-schema event profiling for massive news corpora

Quan Yuan, Xiang Ren, Wenqi He, Chao Zhang, Xinhe Geng, Lifu Huang, Heng Ji, Chin Yew Lin, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the rapid growth of online information services, a sheer volume of news data becomes available. To help people quickly digest the explosive information, we define a new problem - schema-based news event profiling - profiling events reported in open-domain news corpora, with a set of slots and slot-value pairs for each event, where the set of slots forms the schema of an event type. Such profiling not only provides readers with concise views of events, but also facilitates various applications such as information retrieval, knowledge graph construction and question answering. It is however a quite challenging task. The first challenge is to find out events and event types because they are both initially unknown. The second difficulty is the lack of pre-defined event-type schemas. Lastly, even with the schemas extracted, to generate event profiles from them is still essential yet demanding. To address these challenges, we propose a fully automatic, unsupervised, three-step framework to obtain event profiles. First, we develop a Bayesian non-parametric model to detect events and event types by exploiting the slot expressions of the entities mentioned in news articles. Second, we propose an unsupervised embedding model for schema induction that encodes the insight: an entity may serve as the values of multiple slots in an event, but if it appears in more sentences along with the same set of more entities in the event, its slots in these sentences tend to be similar. Finally, we build event profiles by extracting slot values for each event based on the slots' expression patterns. To the best of our knowledge, this is the first work on schema-based profiling for news events. Experimental results on a large news corpus demonstrate the superior performance of our method against the state-of-the-art baselines on event detection, schema induction and event profiling.

Original languageEnglish (US)
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages587-596
Number of pages10
ISBN (Electronic)9781450360142
DOIs
StatePublished - Oct 17 2018
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: Oct 22 2018Oct 26 2018

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
CountryItaly
CityTorino
Period10/22/1810/26/18

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Fingerprint Dive into the research topics of 'Open-schema event profiling for massive news corpora'. Together they form a unique fingerprint.

  • Cite this

    Yuan, Q., Ren, X., He, W., Zhang, C., Geng, X., Huang, L., Ji, H., Lin, C. Y., & Han, J. (2018). Open-schema event profiling for massive news corpora. In N. Paton, S. Candan, H. Wang, J. Allan, R. Agrawal, A. Labrinidis, A. Cuzzocrea, M. Zaki, D. Srivastava, A. Broder, & A. Schuster (Eds.), CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 587-596). (International Conference on Information and Knowledge Management, Proceedings). Association for Computing Machinery. https://doi.org/10.1145/3269206.3271674