TY - GEN
T1 - Robust automatic speech recognition with decoder oriented ideal binary mask estimation
AU - Kim, Lae Hoon
AU - Kim, Kyung Tae
AU - Hasegawa-Johnson, Mark
PY - 2010
Y1 - 2010
N2 - In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.
AB - In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.
KW - Ideal binary mask classification
KW - Missing feature
KW - Robust speech recognition
UR - http://www.scopus.com/inward/record.url?scp=79959819577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959819577&partnerID=8YFLogxK
U2 - 10.21437/interspeech.2010-583
DO - 10.21437/interspeech.2010-583
M3 - Conference contribution
AN - SCOPUS:79959819577
T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
SP - 2066
EP - 2069
BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PB - International Speech Communication Association
ER -