A statistical model for robust integration of narrowband cues in speech

Lawrence K. Saul, Mazin G. Rahim, Jont B. Allen

Research output: Contribution to journalArticlepeer-review

Abstract

We investigate a statistical model for integrating narrowband cues in speech. The model is inspired by two ideas in human speech perception: (i) Fletcher's hypothesis (1953) that independent detectors, working in narrow frequency bands, account for the robustness of auditory strategies, and (ii) Miller and Nicely's analysis (1955) that perceptual confusions in noisy bandlimited speech are correlated with phonetic features. We apply the model to detecting the phonetic feature [+/-sonorant] that distinguishes vowels, approximants, and nasals (sonorants) from stops, fricatives, and affricates (obstruents). The model is represented by a multilayer probabilistic network whose binary hidden variables indicate sonorant cues from different parts of the frequency spectrum. We derive the Expectation-Maximization algorithm for estimating the model's parameters and evaluate its performance on clean and corrupted speech.

Original languageEnglish (US)
Pages (from-to)175-194
Number of pages20
JournalComputer Speech and Language
Volume15
Issue number2
DOIs
StatePublished - Apr 2001
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'A statistical model for robust integration of narrowband cues in speech'. Together they form a unique fingerprint.

Cite this