A spectro-temporal glimpsing index (STGI) for speech intelligibility prediction

Amin Edraki, Wai Yip Chan, Jesper Jensen, Daniel Fogerty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a monaural intrusive speech intelligibility prediction (SIP) algorithm called STGI based on detecting glimpses in short-time segments in a spectro-temporal modulation decomposition of the input speech signals. Unlike existing glimpse-based SIP methods, the application of STGI is not limited to additive uncorrelated noise; STGI can be employed in a broad range of degradation conditions. Our results show that STGI performs consistently well across 15 datasets covering degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, checkerboard noise, and gated noise.

Original languageEnglish (US)
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association
Pages2738-2742
Number of pages5
ISBN (Electronic)9781713836902
DOIs
StatePublished - 2021
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: Aug 30 2021Sep 3 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume4
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period8/30/219/3/21

Keywords

  • Glimpsing
  • Spectro-temporal modulation
  • Speech intelligibility
  • Speech quality model

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'A spectro-temporal glimpsing index (STGI) for speech intelligibility prediction'. Together they form a unique fingerprint.

Cite this