Keep meeting summaries on topic: Abstractive multi-modal meeting summarization

Manling Li, Lingyu Zhang, Heng Ji, Richard J. Radke

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Transcripts of natural, multi-person meetings differ significantly from documents like news articles, which can make Natural Language Generation models generate unfocused summaries. We develop an abstractive meeting summarizer from both videos and audios of meeting recordings. Specifically, we propose a multi-modal hierarchical attention mechanism across three levels: topic segment, utterance and word. To narrow down the focus into topically-relevant segments, we jointly model topic segmentation and summarization. In addition to traditional textual features, we introduce new multi-modal features derived from visual focus of attention, based on the assumption that an utterance is more important if its speaker receives more attention. Experiments show that our model significantly outperforms the state-of-the-art with both BLEU and ROUGE measures.

Original languageEnglish (US)
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2190-2196
Number of pages7
ISBN (Electronic)9781950737482
StatePublished - 2020
Externally publishedYes
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy
Duration: Jul 28 2019Aug 2 2019

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Country/TerritoryItaly
CityFlorence
Period7/28/198/2/19

ASJC Scopus subject areas

  • Language and Linguistics
  • General Computer Science
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Keep meeting summaries on topic: Abstractive multi-modal meeting summarization'. Together they form a unique fingerprint.

Cite this