Spectro-temporal modulation glimpsing for speech intelligibility prediction

Amin Edraki, Wai Yip Chan, Jesper Jensen, Daniel Fogerty

Research output: Contribution to journalReview articlepeer-review

Abstract

We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by “glimpsing” partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.

Original languageEnglish (US)
Article number108620
JournalHearing Research
Volume426
DOIs
StatePublished - Dec 2022

Keywords

  • Glimpsing
  • Spectro-temporal modulation
  • Speech intelligibility

ASJC Scopus subject areas

  • Sensory Systems

Fingerprint

Dive into the research topics of 'Spectro-temporal modulation glimpsing for speech intelligibility prediction'. Together they form a unique fingerprint.

Cite this