TY - GEN
T1 - Optimal multimodal fusion for multimedia data analysis
AU - Wu, Yi
AU - Chang, Kevin Chen Chuan
AU - Chang, Edward Y.
AU - Smith, John R.
PY - 2004
Y1 - 2004
N2 - Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach. The first step finds statistically independent modalities from raw features. In the second step, we use super-kernel fusion to determine the optimal combination of individual modalities. We carefully analyze the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a careful balance of the three design factors, can improve class-prediction accuracy over traditional techniques.
AB - Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach. The first step finds statistically independent modalities from raw features. In the second step, we use super-kernel fusion to determine the optimal combination of individual modalities. We carefully analyze the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a careful balance of the three design factors, can improve class-prediction accuracy over traditional techniques.
KW - Curse of dimensionality
KW - Independent analysis
KW - Modality independence
KW - Multimodal fusion
KW - Super-kernel fusion
UR - http://www.scopus.com/inward/record.url?scp=13444263342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=13444263342&partnerID=8YFLogxK
U2 - 10.1145/1027527.1027665
DO - 10.1145/1027527.1027665
M3 - Conference contribution
AN - SCOPUS:13444263342
SN - 1581138938
SN - 9781581138931
T3 - ACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia
SP - 572
EP - 579
BT - ACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia
PB - Association for Computing Machinery
T2 - ACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia
Y2 - 10 October 2004 through 16 October 2004
ER -