TY - GEN
T1 - SCIPROMPT
T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
AU - You, Zhiwen
AU - Han, Kanyao
AU - Zhu, Haotian
AU - Ludaescher, Bertram
AU - Diesner, Jana
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SCIPROMPT, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. To this end, we select semantically correlated and domain-specific label terms within the context of scientific literature for verbalizer augmentation. Furthermore, we propose a new verbalization strategy that uses correlation scores as additional weights to enhance the prediction performance of the language model during model tuning. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings, especially in classifying fine-grained and emerging scientific topics.
AB - Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SCIPROMPT, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. To this end, we select semantically correlated and domain-specific label terms within the context of scientific literature for verbalizer augmentation. Furthermore, we propose a new verbalization strategy that uses correlation scores as additional weights to enhance the prediction performance of the language model during model tuning. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings, especially in classifying fine-grained and emerging scientific topics.
UR - http://www.scopus.com/inward/record.url?scp=85217811091&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85217811091&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.emnlp-main.350
DO - 10.18653/v1/2024.emnlp-main.350
M3 - Conference contribution
AN - SCOPUS:85217811091
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 6087
EP - 6104
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
Y2 - 12 November 2024 through 16 November 2024
ER -