Robust automatic speech recognition with decoder oriented ideal binary mask estimation

Lae Hoon Kim, Kyung Tae Kim, Mark Allan Hasegawa-Johnson

Research output: Contribution to conferencePaperpeer-review

Abstract

In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.

Original languageEnglish (US)
Pages2066-2069
Number of pages4
StatePublished - Dec 1 2010
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: Sep 26 2010Sep 30 2010

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period9/26/109/30/10

Keywords

  • Ideal binary mask classification
  • Missing feature
  • Robust speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint Dive into the research topics of 'Robust automatic speech recognition with decoder oriented ideal binary mask estimation'. Together they form a unique fingerprint.

Cite this