To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

Daniel Campos, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise. Still, model sizes can make deployment in latency-sensitive or web-scale implementations difficult. This paper studies the relationship between model size, structured pruning, inference efficiency, and summarization accuracy on widely used summarization datasets. We show that model accuracy is tied to the encoder size while inference efficiency is connected to the decoder. Using asymmetric pruning can lead to nearly 3x improvement in inference latency with 1 point loss in Rouge-2. Moreover, we find both the average degradation and the role of asymmetry to be consistent across model sizes and variations in datasets. We release our code1, training regimes, and associated model 2 for broad usage to encourage usage and experimentation.

Original languageEnglish (US)
Title of host publication4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Proceedings of the Workshop
EditorsNafise Sadat Moosavi, Iryna Gurevych, Yufang Hou, Gyuwan Kim, Jin Kim Young, Tal Schuster, Ameeta Agrawal
PublisherAssociation for Computational Linguistics (ACL)
Pages91-109
Number of pages19
ISBN (Electronic)9781959429791
StatePublished - 2023
Externally publishedYes
Event4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Toronto, Canada
Duration: Jul 13 2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023
Country/TerritoryCanada
CityToronto
Period7/13/23 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency'. Together they form a unique fingerprint.

Cite this