Robust automatic speech recognition with decoder oriented ideal binary mask estimation

Lae Hoon Kim, Kyung Tae Kim, Mark Allan Hasegawa-Johnson

Research output: Contribution to conferencePaper

Abstract

In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.

Original languageEnglish (US)
Pages2066-2069
Number of pages4
StatePublished - Dec 1 2010
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: Sep 26 2010Sep 30 2010

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period9/26/109/30/10

Fingerprint

Masks
Noise
Music
Joints
Automatic Speech Recognition
Mask
Signal-to-noise Ratio

Keywords

  • Ideal binary mask classification
  • Missing feature
  • Robust speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Cite this

Kim, L. H., Kim, K. T., & Hasegawa-Johnson, M. A. (2010). Robust automatic speech recognition with decoder oriented ideal binary mask estimation. 2066-2069. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.

Robust automatic speech recognition with decoder oriented ideal binary mask estimation. / Kim, Lae Hoon; Kim, Kyung Tae; Hasegawa-Johnson, Mark Allan.

2010. 2066-2069 Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.

Research output: Contribution to conferencePaper

Kim, LH, Kim, KT & Hasegawa-Johnson, MA 2010, 'Robust automatic speech recognition with decoder oriented ideal binary mask estimation', Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan, 9/26/10 - 9/30/10 pp. 2066-2069.
Kim LH, Kim KT, Hasegawa-Johnson MA. Robust automatic speech recognition with decoder oriented ideal binary mask estimation. 2010. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.
Kim, Lae Hoon ; Kim, Kyung Tae ; Hasegawa-Johnson, Mark Allan. / Robust automatic speech recognition with decoder oriented ideal binary mask estimation. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.4 p.
@conference{1730f0a5f3cb40c3af1167ad5602d328,
title = "Robust automatic speech recognition with decoder oriented ideal binary mask estimation",
abstract = "In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69{\%} at 0 dB SNR to 40.10{\%} at 15 dB SNR compared with the conventional noise suppression method.",
keywords = "Ideal binary mask classification, Missing feature, Robust speech recognition",
author = "Kim, {Lae Hoon} and Kim, {Kyung Tae} and Hasegawa-Johnson, {Mark Allan}",
year = "2010",
month = "12",
day = "1",
language = "English (US)",
pages = "2066--2069",
note = "11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 ; Conference date: 26-09-2010 Through 30-09-2010",

}

TY - CONF

T1 - Robust automatic speech recognition with decoder oriented ideal binary mask estimation

AU - Kim, Lae Hoon

AU - Kim, Kyung Tae

AU - Hasegawa-Johnson, Mark Allan

PY - 2010/12/1

Y1 - 2010/12/1

N2 - In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.

AB - In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.

KW - Ideal binary mask classification

KW - Missing feature

KW - Robust speech recognition

UR - http://www.scopus.com/inward/record.url?scp=79959819577&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959819577&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:79959819577

SP - 2066

EP - 2069

ER -