TY - GEN
T1 - OBERTa
T2 - 4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023
AU - Campos, Daniel
AU - Marques, Alexandre
AU - Kurtz, Mark
AU - Zhai, Cheng Xiang
N1 - Publisher Copyright:
© 2023 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All rights reserved.
PY - 2023
Y1 - 2023
N2 - In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings, improves distillation, and model initialization to deliver higher accuracy on a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT for pruning during pre-training and finetuning. We find it less amenable to compression during fine-tuning. We explore the use of oBERTa on seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTbase and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x respectively faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.
AB - In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings, improves distillation, and model initialization to deliver higher accuracy on a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT for pruning during pre-training and finetuning. We find it less amenable to compression during fine-tuning. We explore the use of oBERTa on seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTbase and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x respectively faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.
UR - http://www.scopus.com/inward/record.url?scp=85175875038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175875038&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.sustainlp-1.3
DO - 10.18653/v1/2023.sustainlp-1.3
M3 - Conference contribution
AN - SCOPUS:85175875038
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 39
EP - 58
BT - 4th Workshop on Simple and Efficient Natural Language Processing, SustaiNLP 2023 - Proceedings of the Workshop
A2 - Moosavi, Nafise Sadat
A2 - Gurevych, Iryna
A2 - Hou, Yufang
A2 - Kim, Gyuwan
A2 - Young, Jin Kim
A2 - Schuster, Tal
A2 - Agrawal, Ameeta
PB - Association for Computational Linguistics (ACL)
Y2 - 13 July 2023
ER -