TY - JOUR
T1 - The Intelligibility Benefits of Modern Computer-Synthesized Speech for Normal-Hearing and Hearing-Impaired Listeners in Non-Ideal Listening Conditions
AU - Ma, Yizhen
AU - Tang, Yan
PY - 2024
Y1 - 2024
N2 - Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices in human–machine interfaces, such as public announcement systems, answering machines, virtual personal assistants, and GPS, to interact with users. However, previous studies showed that speech generated by computer speech synthesizers was often intrinsically less natural and intelligible than natural speech produced by human speakers. In terms of noise, listening to synthetic speech is challenging for listeners with normal hearing (NH), not to mention for hearing-impaired (HI) listeners. Recent developments in speech synthesis have significantly improved the naturalness of synthetic speech. In this study, the intelligibility of speech generated by commercial synthesizers from Google, Amazon, and Microsoft was evaluated by both NH and HI listeners in different noise conditions. Compared to a natural female voice as the baseline, listeners’ listening performance suggested that some of the synthetic speech was significantly more intelligible even at rather adverse listening conditions for the NH cohort. Further acoustical analyses revealed that elongated vowel sounds and reduced spectral tilt were primarily responsible for improved intelligibility for NH, but not for HI due to their impairment at high frequencies and possible cognitive decline associated with aging.
AB - Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices in human–machine interfaces, such as public announcement systems, answering machines, virtual personal assistants, and GPS, to interact with users. However, previous studies showed that speech generated by computer speech synthesizers was often intrinsically less natural and intelligible than natural speech produced by human speakers. In terms of noise, listening to synthetic speech is challenging for listeners with normal hearing (NH), not to mention for hearing-impaired (HI) listeners. Recent developments in speech synthesis have significantly improved the naturalness of synthetic speech. In this study, the intelligibility of speech generated by commercial synthesizers from Google, Amazon, and Microsoft was evaluated by both NH and HI listeners in different noise conditions. Compared to a natural female voice as the baseline, listeners’ listening performance suggested that some of the synthetic speech was significantly more intelligible even at rather adverse listening conditions for the NH cohort. Further acoustical analyses revealed that elongated vowel sounds and reduced spectral tilt were primarily responsible for improved intelligibility for NH, but not for HI due to their impairment at high frequencies and possible cognitive decline associated with aging.
KW - intelligibility
KW - Mandarin Chinese
KW - intelligibility model
KW - noise
KW - hearing-impaired
KW - synthetic speech
U2 - 10.3390/ohbm5010005
DO - 10.3390/ohbm5010005
M3 - Article
SN - 2504-463X
VL - 5
JO - Journal of Otorhinolaryngology, Hearing and Balance Medicine
JF - Journal of Otorhinolaryngology, Hearing and Balance Medicine
IS - 1
ER -