Voicing-, voiceless-, and non-glimpses in speech intelligibility prediction

Yinglun Sun, Yan Tang

Research output: Contribution to conferenceAbstractpeer-review


The number of speech spectro-temporal (S-T) regions escaping from noise masking, known as “glimpses,” is proportional to speech intelligibility in noise. Previous studies have demonstrated that intelligibility can be estimated by calculating the glimpse proportion (GP). More recent evidence revealed that the contribution of glimpses to intelligibility differs in the energy level of the glimpsed regions, and that even non-glimpsed regions play a non-negligible role in speech perception in noise. This study incorporated the voicing-viceless information in estimating intelligibility using glimpses. Before computing the GP, the counts of raw glimpsed regions or those with energy above the mean noise level were weighted according to the voicing-voiceless status of a frame where the glimpses were detected. Evaluated using speech signals processed to have thirteen glimpse compositions in both temporally stationary and fluctuating noise maskers, the linear correlation between model predictions and listeners' word recognition rates increased from 0.76 to 0.80 for weighted GP, and from 0.89 to 0.92 for weighted high-energy GP. Further taking the contribution from non-glimpsed regions into account in the model improved the correlation to 0.95, suggesting that intelligibility in noise can be better predicted when the contributions of different speech regions are finely modelled.
Original languageEnglish (US)
StatePublished - 2023
Event184th Meeting of the Acoustical Society of America, ASA 2023 -
Duration: May 8 2023May 12 2023


Conference184th Meeting of the Acoustical Society of America, ASA 2023
Internet address


Dive into the research topics of 'Voicing-, voiceless-, and non-glimpses in speech intelligibility prediction'. Together they form a unique fingerprint.

Cite this