A utility framework for the automatic generation of audio-visual skims

Hari Sundaram, Lexing Xie, Shih Fu Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present a novel algorithm for generating audio-visual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90%.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th ACM International Conference on Multimedia, MULTIMEDIA 2002
PublisherAssociation for Computing Machinery
Pages189-198
Number of pages10
ISBN (Electronic)158113620X, 9781581136203
DOIs
StatePublished - Dec 1 2002
Externally publishedYes
Event10th ACM International Conference on Multimedia, MULTIMEDIA 2002 - Juan-les-Pins, France
Duration: Dec 1 2002Dec 6 2002

Publication series

NameProceedings of the 10th ACM International Conference on Multimedia, MULTIMEDIA 2002

Conference

Conference10th ACM International Conference on Multimedia, MULTIMEDIA 2002
Country/TerritoryFrance
CityJuan-les-Pins
Period12/1/0212/6/02

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'A utility framework for the automatic generation of audio-visual skims'. Together they form a unique fingerprint.

Cite this