TY - GEN
T1 - Examining the Causal Effect of First Names on Language Models
T2 - 3rd Workshop on Trustworthy Natural Language Processing, TrustNLP 2023, co-located with ACL 2023
AU - Jeoung, Sullam
AU - Diesner, Jana
AU - Kilicoglu, Halil
N1 - Publisher Copyright:
© 2023 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All rights reserved.
PY - 2023
Y1 - 2023
N2 - As language models continue to be integrated into applications of personal and societal relevance, ensuring these models' trustworthiness is crucial, particularly with respect to producing consistent outputs regardless of sensitive attributes. Given that first names may serve as proxies for (intersectional) socio-demographic representations, it is imperative to examine the impact of first names on commonsense reasoning capabilities. In this paper, we study whether a model's reasoning given a specific input differs based on the first names provided. Our underlying assumption is that the reasoning about Alice should not differ from the reasoning about James. We propose and implement a controlled experimental framework to measure the causal effect of first names on commonsense reasoning, enabling us to distinguish between model predictions due to chance and caused by actual factors of interest. Our results indicate that the frequency of first names has a direct effect on model prediction, with less frequent names yielding divergent predictions compared to more frequent names. To gain insights into the internal mechanisms of models that are contributing to these behaviors, we also conduct an in-depth explainable analysis. Overall, our findings suggest that to ensure model robustness, it is essential to augment datasets with more diverse first names during the configuration stage.
AB - As language models continue to be integrated into applications of personal and societal relevance, ensuring these models' trustworthiness is crucial, particularly with respect to producing consistent outputs regardless of sensitive attributes. Given that first names may serve as proxies for (intersectional) socio-demographic representations, it is imperative to examine the impact of first names on commonsense reasoning capabilities. In this paper, we study whether a model's reasoning given a specific input differs based on the first names provided. Our underlying assumption is that the reasoning about Alice should not differ from the reasoning about James. We propose and implement a controlled experimental framework to measure the causal effect of first names on commonsense reasoning, enabling us to distinguish between model predictions due to chance and caused by actual factors of interest. Our results indicate that the frequency of first names has a direct effect on model prediction, with less frequent names yielding divergent predictions compared to more frequent names. To gain insights into the internal mechanisms of models that are contributing to these behaviors, we also conduct an in-depth explainable analysis. Overall, our findings suggest that to ensure model robustness, it is essential to augment datasets with more diverse first names during the configuration stage.
UR - http://www.scopus.com/inward/record.url?scp=85175451550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175451550&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85175451550
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 61
EP - 72
BT - 3rd Workshop on Trustworthy Natural Language Processing, TrustNLP 2023 - Proceedings of the Workshop
A2 - Ovalle, Anaelia
A2 - Chang, Kai-Wei
A2 - Chang, Kai-Wei
A2 - Mehrabi, Ninareh
A2 - Pruksachatkun, Yada
A2 - Galystan, Aram
A2 - Galystan, Aram
A2 - Dhamala, Jwala
A2 - Verma, Apurv
A2 - Cao, Trista
A2 - Kumar, Anoop
A2 - Gupta, Rahul
PB - Association for Computational Linguistics (ACL)
Y2 - 14 July 2023
ER -