TY - JOUR
T1 - Improving intelligibility prediction under informational masking using an auditory saliency model
AU - Tang, Yan
AU - Cox, Trevor J.
N1 - Funding Information:
This work was supported by the EPSRC Programme Grant S3A: Future Spatial Audio for an Immersive Listener Experience at Home (EP/L000539/1) and the BBC as part of the BBC Audio Research Partnership.
Publisher Copyright:
Copyright © 2018 DAFx.All rights reserved.
PY - 2018
Y1 - 2018
N2 - The reduction of speech intelligibility in noise is usually dominated by energetic masking (EM) and informational masking (IM). Most state-of-the-art objective intelligibility measures (OIM) estimate intelligibility by quantifying EM. Few measures model the effect of IM in detail. In this study, an auditory saliency model, which intends to measure the probability of the sources obtaining auditory attention in a bottom-up process, was integrated into an OIM for improving the performance of intelligibility prediction under IM. While EM is accounted for by the original OIM, IM is assumed to arise from the listener’s attention switching between the target and competing sounds existing in the auditory scene. The performance of the proposed method was evaluated along with three reference OIMs by comparing the model predictions to the listener word recognition rates, for different noise maskers, some of which introduce IM. The results shows that the predictive accuracy of the proposed method is as good as the best reported in the literature. The proposed method, however, provides a physiologically-plausible possibility for both IM and EM modelling.
AB - The reduction of speech intelligibility in noise is usually dominated by energetic masking (EM) and informational masking (IM). Most state-of-the-art objective intelligibility measures (OIM) estimate intelligibility by quantifying EM. Few measures model the effect of IM in detail. In this study, an auditory saliency model, which intends to measure the probability of the sources obtaining auditory attention in a bottom-up process, was integrated into an OIM for improving the performance of intelligibility prediction under IM. While EM is accounted for by the original OIM, IM is assumed to arise from the listener’s attention switching between the target and competing sounds existing in the auditory scene. The performance of the proposed method was evaluated along with three reference OIMs by comparing the model predictions to the listener word recognition rates, for different noise maskers, some of which introduce IM. The results shows that the predictive accuracy of the proposed method is as good as the best reported in the literature. The proposed method, however, provides a physiologically-plausible possibility for both IM and EM modelling.
UR - http://www.scopus.com/inward/record.url?scp=85067132836&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067132836&partnerID=8YFLogxK
M3 - Conference article
SP - 113
EP - 119
JO - Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18)
JF - Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18)
T2 - 21st International Conference on Digital Audio Effects, DAFx 2018
Y2 - 4 September 2018 through 8 September 2018
ER -