A hybrid hierarchical model for multi-document summarization

Asli Celikyilmaz, Dilek Hakkani-Tur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ∼7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.

Original languageEnglish (US)
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Conference Proceedings
EditorsJan Hajic, Sandra Carberry, Stephen Clark
PublisherAssociation for Computational Linguistics (ACL)
Pages815-824
Number of pages10
ISBN (Electronic)1932432663, 9781932432664
StatePublished - 2010
Externally publishedYes
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: Jul 11 2010Jul 16 2010

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume2010-July
ISSN (Print)0736-587X

Other

Other48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Country/TerritorySweden
CityUppsala
Period7/11/107/16/10

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'A hybrid hierarchical model for multi-document summarization'. Together they form a unique fingerprint.

Cite this