TY - JOUR
T1 - Consonant confusions in white noise
AU - Phatak, Sandeep A.
AU - Lovitt, Andrew
AU - Allen, Jont B.
N1 - Funding Information:
We thank all members of the HSR group at Beckman Institute, UIUC, for their inputs. We thank the three anonymous reviewers and the Associate Editor for their constructive comments and encouragement. This research was partially supported by a University of Illinois grant. The data-collection expenses were covered through research funding provided by Etymotic Research and by Starkey Labs. 1
PY - 2008
Y1 - 2008
N2 - The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set {/m/, /n/}, average-error set {/p/, /t/, /k/, /s/, /∫/, /d/, /g/, /z/, /z/}, and high-error set {/f/, /θ/, /b/, /v/, /o/}. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.
AB - The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set {/m/, /n/}, average-error set {/p/, /t/, /k/, /s/, /∫/, /d/, /g/, /z/, /z/}, and high-error set {/f/, /θ/, /b/, /v/, /o/}. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.
UR - http://www.scopus.com/inward/record.url?scp=49249135127&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49249135127&partnerID=8YFLogxK
U2 - 10.1121/1.2913251
DO - 10.1121/1.2913251
M3 - Article
C2 - 18681609
AN - SCOPUS:49249135127
SN - 0001-4966
VL - 124
SP - 1220
EP - 1233
JO - Journal of the Acoustical Society of America
JF - Journal of the Acoustical Society of America
IS - 2
ER -