Optimal speech estimator considering room response as well as additive noise: Different approaches in low and high frequency range

Lae Hoon Kim, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes minimum mean squared error (MMSE) speech signal estimation in a reverberant space using different optimal estimators in the low and high frequency ranges. At low frequencies, an MMSE spectral amplitude estimator divided by the spectral amplitude of a representative impulse response produces optimal performance. In the high frequency range, the MMSE estimator is computed based on its sufficient statistic: the maximum likelihood (ML) estimate. Inference is factored using a two-step algorithm: the maximum likelihood value of the source spectrum is first estimated using expectation-maximization (EM) under the assumption of the hidden room response with complex Gaussian pdf, then the MMSE source spectral estimate is computed.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Pages4573-4576
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
Duration: Mar 31 2008Apr 4 2008

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Country/TerritoryUnited States
CityLas Vegas, NV
Period3/31/084/4/08

Keywords

  • Channel inversion
  • Room response estimation
  • Signal enhancement
  • Statistical room response modeling

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Optimal speech estimator considering room response as well as additive noise: Different approaches in low and high frequency range'. Together they form a unique fingerprint.

Cite this