Real-time lip tracking and bimodal continuous speech recognition

  • M. T. Chan
  • , Y. Zhang
  • , T. S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We investigate using a bimodal approach to speech recognition by incorporating additional visual features derived from lip movement of the speaker. A reference contour model is used to track the lip outline of the speaker. By using color, constraining the deformation in an affine subspace, and by incorporating an outlier rejection mechanism, our system is robust and runs in real time. To address the model initialization issue, a fast lip localization algorithm is also incorporated. A sample of continuous bimodal speech data based on a confined vocabulary (useful for our application area) was synchronously captured for training and testing. Using the hidden Markov modeling framework, we trained our bimodal context-dependent sub-word-based recognizer in a few different ways. The experiments show that the bimodal recognizer compares favorably to the acoustic-only counterpart. The results also indicate that it is advantageous to include first derivatives of the visual features. Furthermore, the 2-stream modeling scheme appears to be preferable to the 1-stream case for bimodal speech.

Original languageEnglish (US)
Title of host publication1998 IEEE 2nd Workshop on Multimedia Signal Processing
EditorsAbeer Alwan, Antonio Ortega, C.-C. Jay Kuo, C.L. Max Nikias, Ping Wah Wong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages65-70
Number of pages6
ISBN (Electronic)0780349199, 9780780349193
DOIs
StatePublished - 1998
Externally publishedYes
Event2nd IEEE Workshop on Multimedia Signal Processing, MMSP 1998 - Redondo Beach, United States
Duration: Dec 7 1998Dec 9 1998

Publication series

Name1998 IEEE 2nd Workshop on Multimedia Signal Processing
Volume1998-December

Other

Other2nd IEEE Workshop on Multimedia Signal Processing, MMSP 1998
Country/TerritoryUnited States
CityRedondo Beach
Period12/7/9812/9/98

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology

Fingerprint

Dive into the research topics of 'Real-time lip tracking and bimodal continuous speech recognition'. Together they form a unique fingerprint.

Cite this