Audio-visual speech modeling using coupled hidden Markov models

Stephen M. Chu, Thomas S. Huang

Research output: Contribution to journalConference articlepeer-review


In this work we consider the bimodal fusion problem in audio-visual speech recognition. A novel sensory fusion architecture based on the coupled hidden Markov models (CHMMs) is presented. CHMMs are directed graphical models of stochastic processes and are a special type of dynamic Bayesian networks. The proposed fusion architecture allows us to address the statistical modeling and the fusion of audio-visual speech in a unified framework. Furthermore, the architecture is capable of capturing the asynchronous and temporal inter-modal dependencies between the two information channels. We describe a model transformation strategy to facilitate inference and learning in CHMMs. Results from audio-visual speech recognition experiments confirmed the superior capability of the proposed fusion architecture.

Original languageEnglish (US)
Pages (from-to)II/2009-II/2012
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
StatePublished - 2002
Event2002 IEEE International Conference on Acoustic, Speech and Signal Processing - Orlando, FL, United States
Duration: May 13 2002May 17 2002

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Audio-visual speech modeling using coupled hidden Markov models'. Together they form a unique fingerprint.

Cite this