Robust analysis and weighting on MFCC components for speech recognition and speaker identification

Xi Zhou, Yun Fu, Ming Liu, Mark Hasegawa-Johnson, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mismatch between training and testing data is a major error source for both Automatic Speech Recognition (ASR) and Automatic Speaker Identification (ASI). In this paper, we first present a statistical weighting concept to exploit the unequal sensitivity of Mel-Frequency Cepstral Coefficients (MFCC) components to against the mismatch, such as ambient noise, recording equipment, transmission channels, and inter-speaker variations. We further design a new Kullback-Leibler (KL) Distance based weighting algorithm according to the proposed weighting concept to real-world problems in which the label information is often not provided. We examine our algorithm in ASR with mismatch by different speakers and also in ASI with mismatch by channel noises. Experimental results demonstrate the effectiveness and robustness of our proposed method.

Original languageEnglish (US)
Title of host publicationProceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
Pages188-191
Number of pages4
StatePublished - Dec 1 2007
EventIEEE International Conference onMultimedia and Expo, ICME 2007 - Beijing, China
Duration: Jul 2 2007Jul 5 2007

Publication series

NameProceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007

Other

OtherIEEE International Conference onMultimedia and Expo, ICME 2007
Country/TerritoryChina
CityBeijing
Period7/2/077/5/07

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Fingerprint

Dive into the research topics of 'Robust analysis and weighting on MFCC components for speech recognition and speaker identification'. Together they form a unique fingerprint.

Cite this