TY - JOUR
T1 - A statistical model for robust integration of narrowband cues in speech
AU - Saul, Lawrence K.
AU - Rahim, Mazin G.
AU - Allen, Jont B.
N1 - Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2001/4
Y1 - 2001/4
N2 - We investigate a statistical model for integrating narrowband cues in speech. The model is inspired by two ideas in human speech perception: (i) Fletcher's hypothesis (1953) that independent detectors, working in narrow frequency bands, account for the robustness of auditory strategies, and (ii) Miller and Nicely's analysis (1955) that perceptual confusions in noisy bandlimited speech are correlated with phonetic features. We apply the model to detecting the phonetic feature [+/-sonorant] that distinguishes vowels, approximants, and nasals (sonorants) from stops, fricatives, and affricates (obstruents). The model is represented by a multilayer probabilistic network whose binary hidden variables indicate sonorant cues from different parts of the frequency spectrum. We derive the Expectation-Maximization algorithm for estimating the model's parameters and evaluate its performance on clean and corrupted speech.
AB - We investigate a statistical model for integrating narrowband cues in speech. The model is inspired by two ideas in human speech perception: (i) Fletcher's hypothesis (1953) that independent detectors, working in narrow frequency bands, account for the robustness of auditory strategies, and (ii) Miller and Nicely's analysis (1955) that perceptual confusions in noisy bandlimited speech are correlated with phonetic features. We apply the model to detecting the phonetic feature [+/-sonorant] that distinguishes vowels, approximants, and nasals (sonorants) from stops, fricatives, and affricates (obstruents). The model is represented by a multilayer probabilistic network whose binary hidden variables indicate sonorant cues from different parts of the frequency spectrum. We derive the Expectation-Maximization algorithm for estimating the model's parameters and evaluate its performance on clean and corrupted speech.
UR - http://www.scopus.com/inward/record.url?scp=0035323922&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0035323922&partnerID=8YFLogxK
U2 - 10.1006/csla.2001.0164
DO - 10.1006/csla.2001.0164
M3 - Article
AN - SCOPUS:0035323922
SN - 0885-2308
VL - 15
SP - 175
EP - 194
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 2
ER -