TY - GEN
T1 - COMPOsitional Data Augmentation for Abstractive Conversation Summarization
AU - Ouyang, Siru
AU - Chen, Jiaao
AU - Han, Jiawei
AU - Yang, Diyi
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Recent abstractive conversation summarization systems generally rely on large-scale datasets with annotated summaries. However, collecting and annotating these conversations can be a time-consuming and labor-intensive task. To address this issue, in this work, we present a sub-structure level compositional data augmentation method, COMPO, for generating diverse and high-quality pairs of conversations and summaries. Specifically, COMPO first extracts conversation structures like topic splits and action triples as basic units. Then we organize these semantically meaningful conversation snippets compositionally to create new training instances. Additionally, we explore noise-tolerant settings in both self-training and joint-training paradigms to make the most of these augmented samples. Our experiments on benchmark datasets, SAMSum and DialogSum, show that COMPO substantially outperforms prior baseline methods by achieving a nearly 10% increase of ROUGE scores with limited data. We have publically released our code at https://github.com/ozyyshr/Compo.
AB - Recent abstractive conversation summarization systems generally rely on large-scale datasets with annotated summaries. However, collecting and annotating these conversations can be a time-consuming and labor-intensive task. To address this issue, in this work, we present a sub-structure level compositional data augmentation method, COMPO, for generating diverse and high-quality pairs of conversations and summaries. Specifically, COMPO first extracts conversation structures like topic splits and action triples as basic units. Then we organize these semantically meaningful conversation snippets compositionally to create new training instances. Additionally, we explore noise-tolerant settings in both self-training and joint-training paradigms to make the most of these augmented samples. Our experiments on benchmark datasets, SAMSum and DialogSum, show that COMPO substantially outperforms prior baseline methods by achieving a nearly 10% increase of ROUGE scores with limited data. We have publically released our code at https://github.com/ozyyshr/Compo.
UR - http://www.scopus.com/inward/record.url?scp=85168630955&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168630955&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85168630955
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 1471
EP - 1488
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -