TY - GEN
T1 - Detecting semantic concepts using context and audiovisual features
AU - Naphade, M. R.
AU - Huang, T. S.
N1 - Publisher Copyright:
© 2001 IEEE.
PY - 2001
Y1 - 2001
N2 - Detection of high-level semantics from audio-visual data is a challenging multimedia understanding problem. The difficulty mainly lies in the gap that exists between low level media features and high level semantic concepts. In an attempt to bridge this gap, Naphade et al. (see Proceedings of Workshop on Content Based Access to Image and Video Libraries, p.35-39, 2000 and Proceedings of IEEE International Conference on Image Processing, Chicago, IL, vol.3, p.536-40, 1998) proposed a probabilistic framework for semantic understanding. The components of this framework are probabilistic multimedia objects and a graphical network of such objects. We show how the framework supports detection of multiple high-level concepts, which enjoy spatial and temporal-support. More importantly, we show why context matters and how it can be modeled. Using a factor graph framework, we model context and use it to improve detection of sites, objects and events. Using concepts outdoor and flying-helicopter we demonstrate how the factor graph multinet models context and uses it for late integration of multimodal features. Using ROC curves and probability of error curves we support the intuition that context should help.
AB - Detection of high-level semantics from audio-visual data is a challenging multimedia understanding problem. The difficulty mainly lies in the gap that exists between low level media features and high level semantic concepts. In an attempt to bridge this gap, Naphade et al. (see Proceedings of Workshop on Content Based Access to Image and Video Libraries, p.35-39, 2000 and Proceedings of IEEE International Conference on Image Processing, Chicago, IL, vol.3, p.536-40, 1998) proposed a probabilistic framework for semantic understanding. The components of this framework are probabilistic multimedia objects and a graphical network of such objects. We show how the framework supports detection of multiple high-level concepts, which enjoy spatial and temporal-support. More importantly, we show why context matters and how it can be modeled. Using a factor graph framework, we model context and use it to improve detection of sites, objects and events. Using concepts outdoor and flying-helicopter we demonstrate how the factor graph multinet models context and uses it for late integration of multimodal features. Using ROC curves and probability of error curves we support the intuition that context should help.
UR - http://www.scopus.com/inward/record.url?scp=4944223905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4944223905&partnerID=8YFLogxK
U2 - 10.1109/EVENT.2001.938871
DO - 10.1109/EVENT.2001.938871
M3 - Conference contribution
AN - SCOPUS:4944223905
T3 - Proceedings - IEEE Workshop on Detection and Recognition of Events in Video, EVENT 2001
SP - 92
EP - 98
BT - Proceedings - IEEE Workshop on Detection and Recognition of Events in Video, EVENT 2001
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Workshop on Detection and Recognition of Events in Video, EVENT 2001
Y2 - 8 July 2001
ER -