TY - GEN
T1 - Mining News Events from Comparable News Corpora
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
AU - Kim, Hyungsul
AU - El-Kishky, Ahmed
AU - Ren, Xiang
AU - Han, Jiawei
N1 - Funding Information:
Research was sponsored in part by U.S. Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), DARPA under Agreements No. W911NF-17-C-0099 and FA8750-19-2-1004, National Science Foundation IIS 16-18481, IIS 17-04532, and IIS-17-41317, DTRA HDTRA11810026, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov). Any opinions, findings, and conclusions or recommendations expressed in this document are those of the author(s) and should not be interpreted as the views of any U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - We present ProxiModel, a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data sources. The proposed model differentiates itself from other approaches by modeling both the event correlation within each individual document as well as across the corpus. To facilitate this, we introduce the concept of a proximity-network, a novel space-efficient data structure to facilitate scalable event mining. This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections. We probabilistically model the proximity network as a generative process with sparsity-inducing regularization. This allows us to efficiently and effectively extract high-quality and interpretable news events. Experiments on three different news corpora demonstrate that the proposed method is effective and robust at generating high-quality event descriptors and attributes. We briefly detail many interesting applications from our proposed framework such as news summarization, event tracking and multi-dimensional analysis on news. Finally, we explore a case study on visualizing the events for a Japan Tsunami news corpus and demonstrate ProxiModel's ability to automatically summarize emerging news events.
AB - We present ProxiModel, a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data sources. The proposed model differentiates itself from other approaches by modeling both the event correlation within each individual document as well as across the corpus. To facilitate this, we introduce the concept of a proximity-network, a novel space-efficient data structure to facilitate scalable event mining. This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections. We probabilistically model the proximity network as a generative process with sparsity-inducing regularization. This allows us to efficiently and effectively extract high-quality and interpretable news events. Experiments on three different news corpora demonstrate that the proposed method is effective and robust at generating high-quality event descriptors and attributes. We briefly detail many interesting applications from our proposed framework such as news summarization, event tracking and multi-dimensional analysis on news. Finally, we explore a case study on visualizing the events for a Japan Tsunami news corpus and demonstrate ProxiModel's ability to automatically summarize emerging news events.
UR - http://www.scopus.com/inward/record.url?scp=85081310565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081310565&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006049
DO - 10.1109/BigData47090.2019.9006049
M3 - Conference contribution
AN - SCOPUS:85081310565
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 105
EP - 114
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 December 2019 through 12 December 2019
ER -