Abstract
Intelligibility predictors tell us a great deal about human speech perception, in particular which acoustic factors strongly effect human behavior, and which do not. A particular intelligibility predictor, the Articulation Index (AI), is interesting because it models human behavior in noise, and its form has implications about representation of speech in the brain. Specifically, the Articulation Index implies that a listener pre-consciously estimates the masking noise distribution and uses it to classify time/frequency samples as speech or non-speech. We classify consonants using representations of speech and noise which are consistent with this hypothesis, and determine whether their error rate and error patterns are more or less consistent with human behavior than representations typical of automatic speech recognition systems. The new representations resulted in error patterns more similar to humans in cases where the testing and training data sets do not have the same masking noise spectrum.
Original language | English (US) |
---|---|
Pages (from-to) | 185-194 |
Number of pages | 10 |
Journal | Speech Communication |
Volume | 53 |
Issue number | 2 |
DOIs | |
State | Published - Feb 2011 |
Keywords
- Articulation Index
- Speech perception
- Speech recognition
- Speech representation
ASJC Scopus subject areas
- Software
- Modeling and Simulation
- Communication
- Language and Linguistics
- Linguistics and Language
- Computer Vision and Pattern Recognition
- Computer Science Applications