A new approach to integrate audio and visual features of speech

H. Pan, Zhi-Pei Liang, Thomas S Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observations to infer the dependencies between audio and video. The learning and inference algorithms described in this paper can handle audio and video features with different data rates and duration. In speaker verification experiments, the results show that the proposed method significantly reduces the recognition error rate as compared to unimodal HMMs and other simpler fusion methods.

Original languageEnglish (US)
Title of host publicationIEEE International Conference on Multi-Media and Expo
Pages1093-1096
Number of pages4
EditionII/TUESDAY
StatePublished - Dec 1 2000
Event2000 IEEE International Conference on Multimedia and Expo (ICME 2000) - New York, NY, United States
Duration: Jul 30 2000Aug 2 2000

Other

Other2000 IEEE International Conference on Multimedia and Expo (ICME 2000)
CountryUnited States
CityNew York, NY
Period7/30/008/2/00

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'A new approach to integrate audio and visual features of speech'. Together they form a unique fingerprint.

  • Cite this

    Pan, H., Liang, Z-P., & Huang, T. S. (2000). A new approach to integrate audio and visual features of speech. In IEEE International Conference on Multi-Media and Expo (II/TUESDAY ed., pp. 1093-1096)