A utility framework for the automatic generation of audio-visual skims

Hari Sundaram, Lexing Xie, Shih Fu Chang

Research output: Contribution to conferencePaperpeer-review


In this paper, we present a novel algorithm for generating audio-visual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90%.

Original languageEnglish (US)
Number of pages10
StatePublished - 2002
Externally publishedYes
Event10th International Conference of Multimedia - Juan les Pins, France
Duration: Dec 1 2002Dec 6 2002


Other10th International Conference of Multimedia
CityJuan les Pins

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'A utility framework for the automatic generation of audio-visual skims'. Together they form a unique fingerprint.

Cite this