Abstract
In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.
Original language | English (US) |
---|---|
Pages | 2066-2069 |
Number of pages | 4 |
State | Published - Dec 1 2010 |
Event | 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan Duration: Sep 26 2010 → Sep 30 2010 |
Other
Other | 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 |
---|---|
Country/Territory | Japan |
City | Makuhari, Chiba |
Period | 9/26/10 → 9/30/10 |
Keywords
- Ideal binary mask classification
- Missing feature
- Robust speech recognition
ASJC Scopus subject areas
- Language and Linguistics
- Speech and Hearing