Robust speaker identification using a CASA front-end

Xiaojia Zhao, Yang Shao, De Liang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Speaker recognition remains a challenging task under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We first show that a recently introduced speaker feature, Gammatone Frequency Cepstral Coefficient, performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by the CASA mask. Both methods are effective. We further combine them into a single system depending on the detected signal to noise ratio (SNR). This system achieves significant performance improvements over related systems under a wide range of SNR conditions.

Original languageEnglish (US)
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages5468-5471
Number of pages4
DOIs
StatePublished - Aug 18 2011
Externally publishedYes
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: May 22 2011May 27 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period5/22/115/27/11

Keywords

  • CASA
  • GFCC
  • Robust speaker identification
  • gammatone frequency cepstral coefficient
  • ideal binary mask

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Robust speaker identification using a CASA front-end'. Together they form a unique fingerprint.

  • Cite this

    Zhao, X., Shao, Y., & Wang, D. L. (2011). Robust speaker identification using a CASA front-end. In 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings (pp. 5468-5471). [5947596] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2011.5947596