Detection of documentary scene changes by audio-visual fusion

Atulya Velivelli, Chong Wah Ngo, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsErwin M. Bakker, Michael S. Lew, Thomas S. Huang, Nicu Sebe, Xiang Zhou
PublisherSpringer
Pages227-237
Number of pages11
ISBN (Print)9783540451136
DOIs
StatePublished - 2003

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2728
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Detection of documentary scene changes by audio-visual fusion'. Together they form a unique fingerprint.

Cite this