We are interested in the problem of robust understanding from noisy spontaneous speech input. In goal driven humanmachine dialog, utterance classification is a key component of the understanding process to determine the intent of the speaker. In this paper we propose a novel algorithm for exploiting ASR word confidence scores for better utterance classification of spoken utterances. Word confidence scores for automatic speech recognition (ASR) provide estimates for word error rates. While previous work has focused on straightforward combination of word confidence scores into Bayesian classifiers, in this paper we extend the mathematical formulation for Boosting classifiers. This extension of die algorithm allows to exploit confidence scores from a 1-best ASR output or from word confusion networks (WCNs). We present methods for on-line and off-line score combinations. The results we show are for a large database of utterances collected using the AT&T VoiceTone SM spoken dialog system. Our experiments show between 5%-10% reduction in error (1-precision) for a given recall using WCNs compared to ASR output.
|ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
|Published - 2004
|Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: May 17 2004 → May 21 2004
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering