Abstract
Large-scale pre-trained language models such as BERT and RoBERTa have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) an Anchored Feature regularizer, which increases the mutual information between local stable features and global features. We provide a principled way to theoretically analyze and improve the robustness of language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.
Original language | English (US) |
---|---|
State | Published - 2021 |
Event | 9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online Duration: May 3 2021 → May 7 2021 |
Conference
Conference | 9th International Conference on Learning Representations, ICLR 2021 |
---|---|
City | Virtual, Online |
Period | 5/3/21 → 5/7/21 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language