An audio-visual fusion framework with joint dimensionality reduction

Ming Liu, Yun Fu, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

By combining audio and visual modalities, the speech recognition systems achieve higher performance and robustness. The fusion strategies to this point are mainly three types: feature level fusion, model level fusion, and decision level fusion. In this paper, we present a novel audio-visual fusion framework, in which a joint dimensionality reduction approach is used to project the audio and visual features into more compact subspaces. With correlation preserving criteria, the representations of projected audio and visual features will be able to preserve the correlation conveyed in the original audio and visual feature space. At the same time, the better model efficiency is achieved in the more compact feature spaces. The experiments on audio-visual person verification demonstrate the efficiency and effectiveness of the proposed fusion framework.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Pages4437-4440
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
Duration: Mar 31 2008Apr 4 2008

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Country/TerritoryUnited States
CityLas Vegas, NV
Period3/31/084/4/08

Keywords

  • Audio-visual fusion
  • Audio-visual person verification
  • Canonical correlation analysis
  • Dimensionality reduction

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An audio-visual fusion framework with joint dimensionality reduction'. Together they form a unique fingerprint.

Cite this