Maximum conditional mutual information projection for speech recognition

Mohamed Kamal Omar, Mark Hasegawa-Johnson

Research output: Contribution to conferencePaper

Abstract

Linear discriminant analysis (LDA) in its original model-free formulation is best suited to classification problems with equal-covariance classes. Heteroscedastic discriminant analysis (HDA) removes this equal covariance constraint, and therefore is more suitable for automatic speech recognition (ASR) systems. However, maximizing HDA objective function does not correspond directly to minimizing the recognition error. In its original formulation, HDA solves a maximum likelihood estimation problem in the original feature space to calculate the HDA transformation matrix. Since the dimension of the original feature space in ASR problems is usually high, the estimation of the HDA transformation matrix becomes computationally expensive and requires a large amount of training data. This paper presents a generalization of LDAthat solves these two problems. We start with showing that the calculation of the LDA projection matrix is a maximum mutual information estimation problem in the lower-dimensional space with some constraints on the model of the joint conditional and unconditional probability density functions (PDF) of the features, and then, by relaxing these constraints, we develop a dimensionality reduction approach that maximizes the conditional mutual information between the class identity and the feature vector in the lower-dimensional space given the recognizer model. Using this approach, we achieved 1% improvement in phoneme recognition accuracy compared to the baseline system. Improvement in recognition accuracy compared to both LDA and HDA approaches is also achieved.

Original languageEnglish (US)
Pages505-508
Number of pages4
StatePublished - Jan 1 2003
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: Sep 1 2003Sep 4 2003

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
CountrySwitzerland
CityGeneva
Period9/1/039/4/03

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Fingerprint Dive into the research topics of 'Maximum conditional mutual information projection for speech recognition'. Together they form a unique fingerprint.

  • Cite this

    Omar, M. K., & Hasegawa-Johnson, M. (2003). Maximum conditional mutual information projection for speech recognition. 505-508. Paper presented at 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland.