Glimpse-based estimation of speech intelligibility from speech-in-noise using artificial neural networks

Research output: Contribution to journalArticlepeer-review

Abstract

While human listeners can, to some extent, understand the information conveyed by the speech signal when it is mixed with noise, traditional objective intelligibility measures usually fail to operate without a priori knowledge of the clean speech signal. This hence limits the usability of those measures in situations where the clean speech signal is inaccessible. In this paper a glimpse-based method is extended to make speech intelligibility predictions directly from speech-plus-noise mixtures. Using a neural network, the proposed method estimates the time-frequency regions with a local speech-to-noise ratio above a given threshold -- known as glimpses -- from the mixture signal, instead of separately comparing the speech signal against the noise signal. The number and locations of the glimpses can then be used to produce an intelligibility score. In Experiment I where listener intelligibilities were measured in one stationary and nine fluctuating noise maskers, the predictions produced by the proposed method were highly correlated with the subjective data, with correlation coefficients above 0.90. In Experiment II, with the same neural network trained on normal natural speech as in Experiment I, the proposed method was used to predict the intelligibility of speech signals modified by intelligibility-enhancement algorithms and synthetic speech. The method can still maintain its predictive power by demonstrating a similar performance to its intrusive counterpart with an overall correlation coefficient of 0.81, which is superior to many modern traditional measures evaluated under the same conditions. Therefore, the proposed method can be used to estimate speech intelligibility in place of traditional measures in conditions where their capacity falls short.
Original languageEnglish (US)
Article number101220
JournalComputer Speech and Language
Volume69
DOIs
StatePublished - Sep 2021

Keywords

  • Artificial neural network
  • Glimpse
  • Noise
  • Non-intrusive
  • Objective intelligibility measure
  • Speech intelligibility

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Glimpse-based estimation of speech intelligibility from speech-in-noise using artificial neural networks'. Together they form a unique fingerprint.

Cite this