Duration dependent input output markov models for audio-visual event detection

Milind R. Naphade, Ashutosh Garg, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Detecting semantic events from audio-visual data with spatiotemporal support is a challenging multimedia understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Conference on Multimedia and Expo
PublisherIEEE Computer Society
Pages253-256
Number of pages4
ISBN (Electronic)0769511988
DOIs
StatePublished - Jan 1 2001
Event2001 IEEE International Conference on Multimedia and Expo, ICME 2001 - Tokyo, Japan
Duration: Aug 22 2001Aug 25 2001

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Other

Other2001 IEEE International Conference on Multimedia and Expo, ICME 2001
CountryJapan
CityTokyo
Period8/22/018/25/01

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Duration dependent input output markov models for audio-visual event detection'. Together they form a unique fingerprint.

Cite this