Glimpsing speech interrupted by speech-modulated noise

Rachel E. Miller, Bobby E. Gibbs, Daniel Fogerty

Research output: Contribution to journalArticlepeer-review

Abstract

Everyday environments frequently present speech in modulated noise backgrounds, such as from a competing talker. Under such conditions, temporal glimpses of speech may be preserved at favorable signal-to-noise ratios during the amplitude dips of the masker. Speech recognition is determined, in part, by these speech glimpses. However, properties of the noise when it dominates the speech may also be important. This study interrupted speech to provide either high-intensity or low-intensity speech glimpses derived from measurements of speech-on-speech masking. These interrupted intervals were deleted and subsequently filled by steady-state noise or one of four different types of noise amplitude modulated by the same or different sentence. Noise was presented at two different levels. Interruption by silence was also examined. Speech recognition was best with high-intensity glimpses and improved when the noise was modulated by missing high-intensity segments. Additional noise conditions detailed significant interactions between the noise level and glimpsed speech level. Overall, high-intensity speech segments, and the amplitude modulation (AM) of the segments, are crucial for speech recognition. Speech recognition is further influenced by the properties of the competing noise (i.e., level and AM) which interact with the glimpsed speech level. Acoustic properties of both speech-dominated and noise-dominated intervals of speech-noise mixtures determine speech recognition.

Original languageEnglish (US)
Pages (from-to)3058-3067
Number of pages10
JournalJournal of the Acoustical Society of America
Volume143
Issue number5
DOIs
StatePublished - May 1 2018
Externally publishedYes

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Glimpsing speech interrupted by speech-modulated noise'. Together they form a unique fingerprint.

Cite this