TY - GEN
T1 - SumDoCs
T2 - 2021 SIAM International Conference on Data Mining, SDM 2021
AU - Zhu, Qi
AU - Guo, Fang
AU - Tian, Jingjing
AU - Mao, Yuning
AU - Han, Jiawei
N1 - Funding Information:
Research was sponsored in part by US DARPA KAIROS Program No. FA8750-19-2-1004 and SocialSim Program No. W911NF-17-C-0099, National Science Foundation IIS-19-56151, IIS-17-41317, IIS 17-04532, and IIS 16-18481, and DTRA HDTRA11810026.
Publisher Copyright:
© 2021 by SIAM.
PY - 2021
Y1 - 2021
N2 - Multi-document summarization, which summarizes a set of documents with a small number of phrases or sentences, provides a concise and critical essence of the documents. Existing multi-document summarization methods ignore the fact that there often exist many relevant documents that provide surrounding background knowledge, which can help generate a salient and discriminative summary for a given set of documents. In this paper, we propose a novel method, SUMDocS (Surrounding-aware Unsupervised Multi-Document Summarization), which incorporates rich surrounding (topically related) documents to help improve the quality of extractive summarization without human supervision. Specifically, we propose a joint optimization algorithm to unify global novelty (i.e., category-level frequent and discriminative), local consistency (i.e., locally frequent, co-occurring), and local saliency (i.e., salient from its surroundings) such that the obtained summary captures the characteristics of the target documents. Extensive experiments on news and scientific domains demonstrate the superior performance of our method when the unlabeled surrounding corpus is utilized.
AB - Multi-document summarization, which summarizes a set of documents with a small number of phrases or sentences, provides a concise and critical essence of the documents. Existing multi-document summarization methods ignore the fact that there often exist many relevant documents that provide surrounding background knowledge, which can help generate a salient and discriminative summary for a given set of documents. In this paper, we propose a novel method, SUMDocS (Surrounding-aware Unsupervised Multi-Document Summarization), which incorporates rich surrounding (topically related) documents to help improve the quality of extractive summarization without human supervision. Specifically, we propose a joint optimization algorithm to unify global novelty (i.e., category-level frequent and discriminative), local consistency (i.e., locally frequent, co-occurring), and local saliency (i.e., salient from its surroundings) such that the obtained summary captures the characteristics of the target documents. Extensive experiments on news and scientific domains demonstrate the superior performance of our method when the unlabeled surrounding corpus is utilized.
UR - http://www.scopus.com/inward/record.url?scp=85120960502&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120960502&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85120960502
T3 - SIAM International Conference on Data Mining, SDM 2021
SP - 477
EP - 485
BT - SIAM International Conference on Data Mining, SDM 2021
PB - Siam Society
Y2 - 29 April 2021 through 1 May 2021
ER -