Abstract
In this paper, the audio and visual features of speech are integrated using a novel fused-HMM. We assume that the two sets of features may have different data rates and duration. Hidden Markov models (HMMs) are first used to model them separately, and then a general Bayesian fusion method, which is optimal in the maximum entropy sense, is employed to fuse them together. Particularly, an efficient learning algorithm is introduced. Instead of maximizing the joint likelihood of the fuse-HMM, the learning algorithm maximizes the two HMMs separately, and then fuses the HMMs together. In addition, an inference algorithm is proposed. We have tested the proposed method by person verification experiments. Results show that the proposed method significantly reduces the recognition error rates as compared to the unimodal HMMs and the loosely-coupled fusion model.
Original language | English (US) |
---|---|
Title of host publication | IEEE International Conference on Image Processing |
Volume | 3 |
State | Published - Dec 1 2000 |
Event | International Conference on Image Processing (ICIP 2000) - Vancouver, BC, Canada Duration: Sep 10 2000 → Sep 13 2000 |
Other
Other | International Conference on Image Processing (ICIP 2000) |
---|---|
Country | Canada |
City | Vancouver, BC |
Period | 9/10/00 → 9/13/00 |
ASJC Scopus subject areas
- Hardware and Architecture
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering