SumDoCs: Surrounding-aware unsupervised multi-document summarization

Qi Zhu, Fang Guo, Jingjing Tian, Yuning Mao, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multi-document summarization, which summarizes a set of documents with a small number of phrases or sentences, provides a concise and critical essence of the documents. Existing multi-document summarization methods ignore the fact that there often exist many relevant documents that provide surrounding background knowledge, which can help generate a salient and discriminative summary for a given set of documents. In this paper, we propose a novel method, SUMDocS (Surrounding-aware Unsupervised Multi-Document Summarization), which incorporates rich surrounding (topically related) documents to help improve the quality of extractive summarization without human supervision. Specifically, we propose a joint optimization algorithm to unify global novelty (i.e., category-level frequent and discriminative), local consistency (i.e., locally frequent, co-occurring), and local saliency (i.e., salient from its surroundings) such that the obtained summary captures the characteristics of the target documents. Extensive experiments on news and scientific domains demonstrate the superior performance of our method when the unlabeled surrounding corpus is utilized.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining, SDM 2021
PublisherSiam Society
Pages477-485
Number of pages9
ISBN (Electronic)9781611976700
StatePublished - 2021
Event2021 SIAM International Conference on Data Mining, SDM 2021 - Virtual, Online
Duration: Apr 29 2021May 1 2021

Publication series

NameSIAM International Conference on Data Mining, SDM 2021

Conference

Conference2021 SIAM International Conference on Data Mining, SDM 2021
CityVirtual, Online
Period4/29/215/1/21

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'SumDoCs: Surrounding-aware unsupervised multi-document summarization'. Together they form a unique fingerprint.

Cite this