TY - JOUR
T1 - Counterfactually Fair Automatic Speech Recognition
AU - Sar, Leda
AU - Hasegawa-Johnson, Mark
AU - Yoo, Chang D.
N1 - Funding Information:
This work was supported in part by the Institute for Information & Communications Technology Planning &Evaluation (IITP) by the Korea government (MSIT) under Grant 2019-0-01396 Development of framework for analyzing, detecting, mitigating of bias in AI model and training data.
Publisher Copyright:
© 2014 IEEE.
PY - 2021
Y1 - 2021
N2 - Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.
AB - Widelyused automatic speech recognition (ASR) systems have been empirically demonstrated in various studies to be unfair, having higher error rates for some groups of users than others. One way to define fairness in ASR is to require that changing the demographic group affiliation of any individual (e.g., changing their gender, age, education or race) should not change the probability distribution across possible speech-to-text transcriptions. In the paradigm of counterfactual fairness, all variables independent of group affiliation (e.g., the text being read by the speaker) remain unchanged, while variables dependent on group affiliation (e.g., the speaker's voice) are counterfactually modified. Hence, we approach the fairness of ASR by training the ASR to minimize change in its outcome probabilities despite a counterfactual change in the individual's demographic attributes. Starting from the individualized counterfactual equal odds criterion, we provide relaxations to it and compare their performances for connectionist temporal classification (CTC) based end-to-end ASR systems. We perform our experiments on the Corpus of Regional African American Language (CORAAL) and the LibriSpeech dataset to accommodate for differences due to gender, age, education, and race. We show that with counterfactual training, we can reduce average character error rates while achieving lower performance gap between demographic groups, and lower error standard deviation among individuals.
KW - Automatic speech recognition
KW - counterfactual fairness
KW - fairness in machine learning
KW - speaker adaptation
UR - http://www.scopus.com/inward/record.url?scp=85119948304&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119948304&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2021.3126949
DO - 10.1109/TASLP.2021.3126949
M3 - Article
AN - SCOPUS:85119948304
SN - 2329-9290
VL - 29
SP - 3515
EP - 3525
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -