Boosted audio-visual HMM for speech reading

Pei Yin, Irfan Essa, James M. Rehg

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

Original languageEnglish (US)
Title of host publicationIEEE International Workshop on Analysis and Modeling of Faces and Gestures, AMFG 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages68-73
Number of pages6
ISBN (Electronic)0769520103, 9780769520100
StatePublished - 2003
Externally publishedYes
Event2003 IEEE International Workshop on Analysis and Modeling of Faces and Gestures, AMFG 2003 - Nice, France
Duration: Oct 17 2003 → …

Publication series

NameIEEE International Workshop on Analysis and Modeling of Faces and Gestures, AMFG 2003

Conference

Conference2003 IEEE International Workshop on Analysis and Modeling of Faces and Gestures, AMFG 2003
Country/TerritoryFrance
CityNice
Period10/17/03 → …

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Boosted audio-visual HMM for speech reading'. Together they form a unique fingerprint.

Cite this