Abstract
In this work we consider the bimodal fusion problem in audio-visual speech recognition. A novel sensory fusion architecture based on the coupled hidden Markov models (CHMMs) is presented. CHMMs are directed graphical models of stochastic processes and are a special type of dynamic Bayesian networks. The proposed fusion architecture allows us to address the statistical modeling and the fusion of audio-visual speech in a unified framework. Furthermore, the architecture is capable of capturing the asynchronous and temporal inter-modal dependencies between the two information channels. We describe a model transformation strategy to facilitate inference and learning in CHMMs. Results from audio-visual speech recognition experiments confirmed the superior capability of the proposed fusion architecture.
Original language | English (US) |
---|---|
Pages (from-to) | II/2009-II/2012 |
Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Volume | 2 |
DOIs | |
State | Published - 2002 |
Externally published | Yes |
Event | 2002 IEEE International Conference on Acoustic, Speech and Signal Processing - Orlando, FL, United States Duration: May 13 2002 → May 17 2002 |
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering