TY - JOUR
T1 - Seamless equal accuracy ratio for inclusive CTC speech recognition
AU - Gao, Heting
AU - Wang, Xiaoxuan
AU - Kang, Sunghun
AU - Mina, Rusty
AU - Issa, Dias
AU - Harvill, John
AU - Sari, Leda
AU - Hasegawa-Johnson, Mark
AU - Yoo, Chang D.
N1 - Funding Information:
This work was supported by the Korean Institute for Information & Communications Technology Planning & Evaluation (IITP), South Korea grant number 2019-0-01396 , Development of framework for analyzing, detecting, mitigating of bias in AI model and training data, South Korea . Opinions and findings are those of the authors, and are not endorsed by IITP.
Publisher Copyright:
© 2021 The Author(s)
PY - 2022/1
Y1 - 2022/1
N2 - Concerns have been raised regarding performance disparity in automatic speech recognition (ASR) systems as they provide unequal transcription accuracy for different user groups defined by different attributes that include gender, dialect, and race. In this paper, we propose “equal accuracy ratio”, a novel inclusiveness measure for ASR systems that can be seamlessly integrated into the standard connectionist temporal classification (CTC) training pipeline of an end-to-end neural speech recognizer to increase the recognizer's inclusiveness. We also create a novel multi-dialect benchmark dataset to study the inclusiveness of ASR, by combining data from existing corpora in seven dialects of English (African American, General American, Latino English, British English, Indian English, Afrikaaner English, and Xhosa English). Experiments on this multi-dialect corpus show that using the equal accuracy ratio as a regularization term along with CTC loss, succeeds in lowering the accuracy gap between user groups and reduces the recognition error rate compared with a non-regularized baseline. Experiments on additional speech corpora that have different user groups also confirm our findings.
AB - Concerns have been raised regarding performance disparity in automatic speech recognition (ASR) systems as they provide unequal transcription accuracy for different user groups defined by different attributes that include gender, dialect, and race. In this paper, we propose “equal accuracy ratio”, a novel inclusiveness measure for ASR systems that can be seamlessly integrated into the standard connectionist temporal classification (CTC) training pipeline of an end-to-end neural speech recognizer to increase the recognizer's inclusiveness. We also create a novel multi-dialect benchmark dataset to study the inclusiveness of ASR, by combining data from existing corpora in seven dialects of English (African American, General American, Latino English, British English, Indian English, Afrikaaner English, and Xhosa English). Experiments on this multi-dialect corpus show that using the equal accuracy ratio as a regularization term along with CTC loss, succeeds in lowering the accuracy gap between user groups and reduces the recognition error rate compared with a non-regularized baseline. Experiments on additional speech corpora that have different user groups also confirm our findings.
KW - Fairness
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85121444596&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121444596&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2021.11.004
DO - 10.1016/j.specom.2021.11.004
M3 - Article
AN - SCOPUS:85121444596
SN - 0167-6393
VL - 136
SP - 76
EP - 83
JO - Speech Communication
JF - Speech Communication
ER -