Abstract
We present a general framework of integrating multi modal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal CO-occurrence are taken into account. We discuss various data fusion strategics, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.
Original language | English (US) |
---|---|
Pages | 1073-1076 |
Number of pages | 4 |
State | Published - Dec 1 2000 |
Event | 2000 IEEE International Conference on Multimedia and Expo (ICME 2000) - New York, NY, United States Duration: Jul 30 2000 → Aug 2 2000 |
Other
Other | 2000 IEEE International Conference on Multimedia and Expo (ICME 2000) |
---|---|
Country | United States |
City | New York, NY |
Period | 7/30/00 → 8/2/00 |
Fingerprint
ASJC Scopus subject areas
- Engineering(all)
Cite this
Speaker independent audio-visual speech recognition. / Zhang, Yuanhui; Levinson, Stephen E; Huang, Thomas S.
2000. 1073-1076 Paper presented at 2000 IEEE International Conference on Multimedia and Expo (ICME 2000), New York, NY, United States.Research output: Contribution to conference › Paper
}
TY - CONF
T1 - Speaker independent audio-visual speech recognition
AU - Zhang, Yuanhui
AU - Levinson, Stephen E
AU - Huang, Thomas S
PY - 2000/12/1
Y1 - 2000/12/1
N2 - We present a general framework of integrating multi modal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal CO-occurrence are taken into account. We discuss various data fusion strategics, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.
AB - We present a general framework of integrating multi modal sensory signals for spatial temporal pattern recognition. Statistical methods are used to model time varying events in a collaborative manner such that the inter-modal CO-occurrence are taken into account. We discuss various data fusion strategics, modeling of the inter-modal correlations and extracting statistical parameters for multi-modal models. A bimodal speech recognition system is implemented. A speaker-independent experiment is carried out to test the audio-visual speech recognizer under different kinds of noises from a noise database. Consistent improvements of word recognition accuracy (WRA) are achieved using a cross-validation scheme over different signal-to-noise ratios.
UR - http://www.scopus.com/inward/record.url?scp=0034502214&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0034502214&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:0034502214
SP - 1073
EP - 1076
ER -