TY - JOUR
T1 - A method to identify noise-robust perceptual features
T2 - Application for consonant /t/
AU - Régnier, Marion S.
AU - Allen, Jont B.
N1 - Funding Information:
The authors wish to acknowledge the support of the Human Speech Recognition Group for the valuable comments and data collection. The authors would like to thank Etymotic Research and Starkey Laboratories for the financial support used to pay the subjects in the many experiments. Financial support was mostly provided by the ECE Department, UIUC. This work constitutes a portion of the first author's MS thesis.
PY - 2008
Y1 - 2008
N2 - This study focuses on correlating speech confusion patterns, defined as consonant-vowel confusion as a function of the speech-to-noise ratio, and a model acoustic feature (AF) representation called the AI gram, defined as the articulation index density in the spectrotemporal domain. By collecting many responses from many talkers and listeners, the AF and psychophysical feature (event) is shown to be correlated via the AI-gram model and the confusion matrices at the utterance level, thereby explaining the listener confusion. Consonant /t/ is used as an example to identify its primary robust-to-noise feature, and a precise correlation of the acoustic information with the listeners' confusions is used to label the event. The main spectrotemporal cue defining the /t/ event is an across-frequency temporal coincidence, wherein frequency spread and robustness vary across utterances, while the event remains invariant. The cross-frequency timing event is shown to be the key perceptual feature for consonants in a vowel following context. Coincidences are found to form the basic element of the auditory object. Neural circuits used for coincidence in binaural processing for localization across ears are proposed to be used within one ear across channels. It is further concluded that the event is based on the audibility of the /t/ burst rather than on any superthreshold property.
AB - This study focuses on correlating speech confusion patterns, defined as consonant-vowel confusion as a function of the speech-to-noise ratio, and a model acoustic feature (AF) representation called the AI gram, defined as the articulation index density in the spectrotemporal domain. By collecting many responses from many talkers and listeners, the AF and psychophysical feature (event) is shown to be correlated via the AI-gram model and the confusion matrices at the utterance level, thereby explaining the listener confusion. Consonant /t/ is used as an example to identify its primary robust-to-noise feature, and a precise correlation of the acoustic information with the listeners' confusions is used to label the event. The main spectrotemporal cue defining the /t/ event is an across-frequency temporal coincidence, wherein frequency spread and robustness vary across utterances, while the event remains invariant. The cross-frequency timing event is shown to be the key perceptual feature for consonants in a vowel following context. Coincidences are found to form the basic element of the auditory object. Neural circuits used for coincidence in binaural processing for localization across ears are proposed to be used within one ear across channels. It is further concluded that the event is based on the audibility of the /t/ burst rather than on any superthreshold property.
UR - http://www.scopus.com/inward/record.url?scp=43549101355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43549101355&partnerID=8YFLogxK
U2 - 10.1121/1.2897915
DO - 10.1121/1.2897915
M3 - Article
C2 - 18529196
AN - SCOPUS:43549101355
SN - 0001-4966
VL - 123
SP - 2801
EP - 2814
JO - Journal of the Acoustical Society of America
JF - Journal of the Acoustical Society of America
IS - 5
ER -