Robust automatic speech recognition with decoder oriented ideal binary mask estimation

Lae Hoon Kim, Kyung Tae Kim, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PublisherInternational Speech Communication Association
Pages2066-2069
Number of pages4
DOIs
StatePublished - 2010

Publication series

NameProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

  • Ideal binary mask classification
  • Missing feature
  • Robust speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Robust automatic speech recognition with decoder oriented ideal binary mask estimation'. Together they form a unique fingerprint.

Cite this