TY - GEN
T1 - A multi-stream approach to audiovisual automatic speech recognition
AU - Hasegawa-Johnson, Mark
PY - 2007
Y1 - 2007
N2 - This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel's dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least one dorsal stream that maps from acoustics to articulation. Our implementation represents each of these streams by a dynamic Bayesian network; disagreements between the three streams are resolved using a voting scheme. The proposed algorithm was tested using the CUAVE audiovisual speech corpus. Results indicate that the ventral stream model tends to make fewer mistakes in the labeling of vowels, while the dorsal stream model tends to make fewer mistakes in the labeling of consonants; the recognizer voting scheme takes advantage of these differences to reduce overall word error rate.
AB - This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel's dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least one dorsal stream that maps from acoustics to articulation. Our implementation represents each of these streams by a dynamic Bayesian network; disagreements between the three streams are resolved using a voting scheme. The proposed algorithm was tested using the CUAVE audiovisual speech corpus. Results indicate that the ventral stream model tends to make fewer mistakes in the labeling of vowels, while the dorsal stream model tends to make fewer mistakes in the labeling of consonants; the recognizer voting scheme takes advantage of these differences to reduce overall word error rate.
UR - http://www.scopus.com/inward/record.url?scp=48149114113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=48149114113&partnerID=8YFLogxK
U2 - 10.1109/MMSP.2007.4412884
DO - 10.1109/MMSP.2007.4412884
M3 - Conference contribution
AN - SCOPUS:48149114113
SN - 1424412749
SN - 9781424412747
T3 - 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings
SP - 328
EP - 331
BT - 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings
T2 - 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007
Y2 - 1 October 2007 through 3 October 2007
ER -