TY - GEN
T1 - Joint modeling of accents and acoustics for multi-accent speech recognition
AU - Yang, Xuesong
AU - Audhkhasi, Kartik
AU - Rosenberg, Andrew
AU - Thomas, Samuel
AU - Ramabhadran, Bhuvana
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios. Differences in speaker accents are a significant source of such mismatch. The traditional approach to deal with multiple accents involves pooling data from several accents during training and building a single model in multi-task fashion, where tasks correspond to individual accents. In this paper, we explore an alternate model where we jointly learn an accent classifier and a multi-task acoustic model. Experiments on the American English Wall Street Journal and British English Cambridge corpora demonstrate that our joint model outperforms the strong multi-task acoustic model baseline. We obtain a 5.94% relative improvement in word error rate on British English, and 9.47% relative improvement on American English. This illustrates that jointly modeling with accent information improves acoustic model performance.
AB - The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios. Differences in speaker accents are a significant source of such mismatch. The traditional approach to deal with multiple accents involves pooling data from several accents during training and building a single model in multi-task fashion, where tasks correspond to individual accents. In this paper, we explore an alternate model where we jointly learn an accent classifier and a multi-task acoustic model. Experiments on the American English Wall Street Journal and British English Cambridge corpora demonstrate that our joint model outperforms the strong multi-task acoustic model baseline. We obtain a 5.94% relative improvement in word error rate on British English, and 9.47% relative improvement on American English. This illustrates that jointly modeling with accent information improves acoustic model performance.
KW - Acoustic modeling
KW - End-to-end models
KW - Multi-accent speech recognition
KW - Multi-task learning
UR - http://www.scopus.com/inward/record.url?scp=85054199427&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054199427&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8462557
DO - 10.1109/ICASSP.2018.8462557
M3 - Conference contribution
AN - SCOPUS:85054199427
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5989
EP - 5993
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -