TY - JOUR
T1 - Speech intelligibility prediction using spectro-temporal modulation analysis
AU - Edraki, Amin
AU - Chan, Wai Yip
AU - Jensen, Jesper
AU - Fogerty, Daniel
N1 - Funding Information:
Manuscript received June 10, 2020; revised September 27, 2020, October 29, 2020, and November 5, 2020; accepted November 6, 2020. Date of publication November 24, 2020; date of current version December 7, 2020. The work of Amin Edraki and Wai-Yip Chan was supported in part by the Natural Sciences and Engineering Research Council of Canada, the Demant Foundation, and the Vector Institute. The work of Daniel Fogerty was supported in part by the National Institutes of Health, National Institute on Deafness and Other Communication Disorders, under Grant R01-DC015465. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Timo Gerkmann. (Corresponding author: Amin Edraki).
Publisher Copyright:
© 2014 IEEE.
PY - 2021
Y1 - 2021
N2 - Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.
AB - Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.
KW - Spectro-temporal modulation
KW - speech intelligibility
KW - speech quality model
UR - http://www.scopus.com/inward/record.url?scp=85097126929&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097126929&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.3039929
DO - 10.1109/TASLP.2020.3039929
M3 - Article
AN - SCOPUS:85097126929
SN - 2329-9290
VL - 29
SP - 210
EP - 225
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
M1 - 9269417
ER -