Abstract

The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set {/m/, /n/}, average-error set {/p/, /t/, /k/, /s/, /∫/, /d/, /g/, /z/, /z/}, and high-error set {/f/, /θ/, /b/, /v/, /o/}. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.

Original languageEnglish (US)
Pages (from-to)1220-1233
Number of pages14
JournalJournal of the Acoustical Society of America
Volume124
Issue number2
DOIs
StatePublished - Aug 18 2008

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Fingerprint Dive into the research topics of 'Consonant confusions in white noise'. Together they form a unique fingerprint.

  • Cite this