OBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

Daniel Campos, Alexandre Marques, Mark Kurtz, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings, improves distillation, and model initialization to deliver higher accuracy on a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT for pruning during pre-training and finetuning. We find it less amenable to compression during fine-tuning. We explore the use of oBERTa on seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTbase and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x respectively faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.

Original languageEnglish (US)
Title of host publication4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Proceedings of the Workshop
EditorsNafise Sadat Moosavi, Iryna Gurevych, Yufang Hou, Gyuwan Kim, Jin Kim Young, Tal Schuster, Ameeta Agrawal
PublisherAssociation for Computational Linguistics (ACL)
Pages39-58
Number of pages20
ISBN (Electronic)9781959429791
StatePublished - 2023
Externally publishedYes
Event4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Toronto, Canada
Duration: Jul 13 2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023
Country/TerritoryCanada
CityToronto
Period7/13/23 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'OBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes'. Together they form a unique fingerprint.

Cite this