What Changed? Investigating Debiasing Methods using Causal Mediation Analysis

Sullam Jeoung, Jana Diesner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Previous work has examined how debiasing language models affect downstream tasks, specifically, how debiasing techniques influence task performance and whether debiased models also make impartial predictions in downstream tasks or not. However, what we don't understand well yet is why debiasing methods have varying impacts on downstream tasks and how debiasing techniques affect internal components of language models, i.e., neurons, layers, and attentions. In this paper, we decompose the internal mechanisms of debiasing language models with respect to gender by applying causal mediation analysis to understand the influence of debiasing methods on toxicity detection as a downstream task. Our findings suggest a need to test the effectiveness of debiasing methods with different bias metrics, and to focus on changes in the behavior of certain components of the models, e.g.,first two layers of language models, and attention heads.

Original languageEnglish (US)
Title of host publicationGeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
EditorsChristian Hardmeier, Christian Hardmeier, Christine Basta, Basta Christine, Marta R. Costa-Jussa, Gabriel Stanovsky, Hila Gonen
PublisherAssociation for Computational Linguistics (ACL)
Pages255-265
Number of pages11
ISBN (Electronic)9781955917681
DOIs
StatePublished - 2022
Event4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022 - Seattle, United States
Duration: Jul 15 2022 → …

Publication series

NameGeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop

Conference

Conference4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022
Country/TerritoryUnited States
CitySeattle
Period7/15/22 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • General Psychology
  • Gender Studies

Fingerprint

Dive into the research topics of 'What Changed? Investigating Debiasing Methods using Causal Mediation Analysis'. Together they form a unique fingerprint.

Cite this