TY - GEN
T1 - A hybrid hierarchical model for multi-document summarization
AU - Celikyilmaz, Asli
AU - Hakkani-Tur, Dilek
PY - 2010
Y1 - 2010
N2 - Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ∼7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.
AB - Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ∼7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.
UR - http://www.scopus.com/inward/record.url?scp=84860001749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860001749&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84860001749
SN - 9781617388088
T3 - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 815
EP - 824
BT - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
T2 - 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Y2 - 11 July 2010 through 16 July 2010
ER -