Optimal multimodal fusion for multimedia data analysis

Yi Wu, Kevin Chen Chuan Chang, Edward Y. Chang, John R. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach. The first step finds statistically independent modalities from raw features. In the second step, we use super-kernel fusion to determine the optimal combination of individual modalities. We carefully analyze the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a careful balance of the three design factors, can improve class-prediction accuracy over traditional techniques.

Original languageEnglish (US)
Title of host publicationACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages572-579
Number of pages8
ISBN (Print)1581138938, 9781581138931
DOIs
StatePublished - 2004
EventACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia - New York, NY, United States
Duration: Oct 10 2004Oct 16 2004

Publication series

NameACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia

Other

OtherACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia
Country/TerritoryUnited States
CityNew York, NY
Period10/10/0410/16/04

Keywords

  • Curse of dimensionality
  • Independent analysis
  • Modality independence
  • Multimodal fusion
  • Super-kernel fusion

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Optimal multimodal fusion for multimedia data analysis'. Together they form a unique fingerprint.

Cite this