Bimodal speech recognition using coupled hidden Markov models

Stephen M. Chu, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences between their hidden state variables. The coupling probabilities are both cross chain and cross time. The later is essential for allowing temporal influences between chains, which is important in modeling bimodal speech. Our bimodal speech recognition system employs a two-chain CHMM, with one chain being associated with the acoustic observations, the other with the visual features. A deterministic approximation for maximum a posteriori (MAP) estimation is used to enable fast classification and parameter estimation. We evaluated the system on a speaker independent connected-digit task. Comparing with an acoustic-only ASR system trained using only the audio channel of the same database, the bimodal system consistently demonstrates improved noise robustness at all SNRs. We further compare the CHMM system reported in this paper with our earlier bimodal speech recognition system in which the two modalities are fused by concatenating the audio and visual features. The recognition results clearly show the advantages of the CHMM framework in the context of bimodal speech recognition.

Original languageEnglish (US)
Title of host publication6th International Conference on Spoken Language Processing, ICSLP 2000
PublisherInternational Speech Communication Association
ISBN (Electronic)7801501144, 9787801501141
StatePublished - 2000
Externally publishedYes
Event6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China
Duration: Oct 16 2000Oct 20 2000

Publication series

Name6th International Conference on Spoken Language Processing, ICSLP 2000

Other

Other6th International Conference on Spoken Language Processing, ICSLP 2000
Country/TerritoryChina
CityBeijing
Period10/16/0010/20/00

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Bimodal speech recognition using coupled hidden Markov models'. Together they form a unique fingerprint.

Cite this