Boosted audio-visual HMM for speech reading

Pei Yin, Irfan Essa, James M. Rehg

Research output: Contribution to journalConference articlepeer-review

Abstract

We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by Ada-Boost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

Original languageEnglish (US)
Pages (from-to)2013-2018
Number of pages6
JournalConference Record of the Asilomar Conference on Signals, Systems and Computers
Volume2
StatePublished - 2003
Externally publishedYes
EventConference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers - Pacific Grove, CA, United States
Duration: Nov 9 2003Nov 12 2003

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Boosted audio-visual HMM for speech reading'. Together they form a unique fingerprint.

Cite this