The purpose of this paper is to provide insight into how speech is processed by the auditory system, by quantifying the nature of nonsense speech sound confusions. (1) The Miller and Nicely [J. Acoust. Soc. Am. 27(2), 338-352 (1955)] confusion matrix (CM) data are analyzed by plotting the CM elements Si,j(SNR) as a function of the signal-to-noise ratio (SNR). This allows for the robust clustering of perceptual feature (event) groups, not robustly defined by a single CM table, where clusters depend on the sound order. (2) The SNR is then re-expressed as an articulation index (AI), and used as the independent variable. The normalized log scores log(1-Si,i(AI)) and log(Si,j(AI)), j≠i, then become linear functions of AI, on log-error versus AI plots. This linear dependence may be interpreted as an extension of the band-independence model of Fletcher. (3) The model formula for the average score for the finite-alphabet case Pc(AI,H) = ∑i=1NSi,i/N is then modified to include the effect of entropy H. Due to the grouping of sounds with increased SNR (and AI), the sound-group entropy Hg plays a key role in this performance measure. (4) A parametric model for the confusions Si,j(AI, H g) is then described, which characterizes the confusions between competing sounds within a group.
ASJC Scopus subject areas
- Arts and Humanities (miscellaneous)
- Acoustics and Ultrasonics