Abstract

This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel's dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least one dorsal stream that maps from acoustics to articulation. Our implementation represents each of these streams by a dynamic Bayesian network; disagreements between the three streams are resolved using a voting scheme. The proposed algorithm was tested using the CUAVE audiovisual speech corpus. Results indicate that the ventral stream model tends to make fewer mistakes in the labeling of vowels, while the dorsal stream model tends to make fewer mistakes in the labeling of consonants; the recognizer voting scheme takes advantage of these differences to reduce overall word error rate.

Original languageEnglish (US)
Title of host publication2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings
Pages328-331
Number of pages4
DOIs
StatePublished - 2007
Event2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Chania, Crete, Greece
Duration: Oct 1 2007Oct 3 2007

Publication series

Name2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings

Other

Other2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007
Country/TerritoryGreece
CityChania, Crete
Period10/1/0710/3/07

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Signal Processing

Fingerprint

Dive into the research topics of 'A multi-stream approach to audiovisual automatic speech recognition'. Together they form a unique fingerprint.

Cite this