TY - GEN
T1 - What Changed? Investigating Debiasing Methods using Causal Mediation Analysis
AU - Jeoung, Sullam
AU - Diesner, Jana
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Previous work has examined how debiasing language models affect downstream tasks, specifically, how debiasing techniques influence task performance and whether debiased models also make impartial predictions in downstream tasks or not. However, what we don't understand well yet is why debiasing methods have varying impacts on downstream tasks and how debiasing techniques affect internal components of language models, i.e., neurons, layers, and attentions. In this paper, we decompose the internal mechanisms of debiasing language models with respect to gender by applying causal mediation analysis to understand the influence of debiasing methods on toxicity detection as a downstream task. Our findings suggest a need to test the effectiveness of debiasing methods with different bias metrics, and to focus on changes in the behavior of certain components of the models, e.g.,first two layers of language models, and attention heads.
AB - Previous work has examined how debiasing language models affect downstream tasks, specifically, how debiasing techniques influence task performance and whether debiased models also make impartial predictions in downstream tasks or not. However, what we don't understand well yet is why debiasing methods have varying impacts on downstream tasks and how debiasing techniques affect internal components of language models, i.e., neurons, layers, and attentions. In this paper, we decompose the internal mechanisms of debiasing language models with respect to gender by applying causal mediation analysis to understand the influence of debiasing methods on toxicity detection as a downstream task. Our findings suggest a need to test the effectiveness of debiasing methods with different bias metrics, and to focus on changes in the behavior of certain components of the models, e.g.,first two layers of language models, and attention heads.
UR - http://www.scopus.com/inward/record.url?scp=85137602903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137602903&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.gebnlp-1.26
DO - 10.18653/v1/2022.gebnlp-1.26
M3 - Conference contribution
AN - SCOPUS:85137602903
T3 - GeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
SP - 255
EP - 265
BT - GeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
A2 - Hardmeier, Christian
A2 - Hardmeier, Christian
A2 - Basta, Christine
A2 - Christine, Basta
A2 - Costa-Jussa, Marta R.
A2 - Stanovsky, Gabriel
A2 - Gonen, Hila
PB - Association for Computational Linguistics (ACL)
T2 - 4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022
Y2 - 15 July 2022
ER -