TY - GEN
T1 - Beyond Binary Gender Labels
T2 - 5th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2024, held in conjunction with the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
AU - You, Zhiwen
AU - Lee, Hae Jin
AU - Mishra, Shubhanshu
AU - Jeoung, Sullam
AU - Mishra, Apratim
AU - Kim, Jinseok
AU - Diesner, Jana
N1 - Publisher Copyright:
©2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Name-based gender prediction has traditionally categorized individuals as either female or male based on their names, using a binary classification system. That binary approach can be problematic in the cases of gender-neutral names that do not align with any one gender, among other reasons. Relying solely on binary gender categories without recognizing gender-neutral names can reduce the inclusiveness of gender prediction tasks. We introduce an additional gender category, i.e., “neutral”, to study and address potential gender biases in Large Language Models (LLMs). We evaluate the performance of several foundational and large language models in predicting gender based on first names only. Additionally, we investigate the impact of adding birth years to enhance the accuracy of gender prediction, accounting for shifting associations between names and genders over time. Our findings indicate that most LLMs identify male and female names with high accuracy (over 80%) but struggle with gender-neutral names (under 40%), and the accuracy of gender prediction is higher for English-based first names than non-English names. The experimental results show that incorporating the birth year does not improve the overall accuracy of gender prediction, especially for names with evolving gender associations. We recommend using caution when applying LLMs for gender identification in downstream tasks, particularly when dealing with non-binary gender labels.
AB - Name-based gender prediction has traditionally categorized individuals as either female or male based on their names, using a binary classification system. That binary approach can be problematic in the cases of gender-neutral names that do not align with any one gender, among other reasons. Relying solely on binary gender categories without recognizing gender-neutral names can reduce the inclusiveness of gender prediction tasks. We introduce an additional gender category, i.e., “neutral”, to study and address potential gender biases in Large Language Models (LLMs). We evaluate the performance of several foundational and large language models in predicting gender based on first names only. Additionally, we investigate the impact of adding birth years to enhance the accuracy of gender prediction, accounting for shifting associations between names and genders over time. Our findings indicate that most LLMs identify male and female names with high accuracy (over 80%) but struggle with gender-neutral names (under 40%), and the accuracy of gender prediction is higher for English-based first names than non-English names. The experimental results show that incorporating the birth year does not improve the overall accuracy of gender prediction, especially for names with evolving gender associations. We recommend using caution when applying LLMs for gender identification in downstream tasks, particularly when dealing with non-binary gender labels.
UR - http://www.scopus.com/inward/record.url?scp=85204400954&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204400954&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85204400954
T3 - GeBNLP 2024 - 5th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
SP - 255
EP - 268
BT - GeBNLP 2024 - 5th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
A2 - Falenska, Agnieszka
A2 - Basta, Christine
A2 - Costa-jussa, Marta
A2 - Goldfarb-Tarrant, Seraphina
A2 - Nozza, Debora
PB - Association for Computational Linguistics (ACL)
Y2 - 16 August 2024
ER -