Abstract
In this paper we present a novel algorithm for video scene segmentation. We model a scene as a semantically consistent chunk of audio-visual data. Central to the segmentation framework is the idea of a finite-memory model. We separately segment the audio and video data into scenes, using data in the memory. The audio segmentation algorithm determines the correlations amongst the envelopes of audio features. The video segmentation algorithm determines the correlations amongst. shot key-frames. The scene boundaries in both cases are determined using local correlation minima. Then, we fuse the resulting segments using a nearest neighbor algorithm that is further refined using a time-alignment distribution derived from the ground truth. The algorithm was tested on a difficult data set; the first hour of a commercial film with good results. It achieves a scene segmentation accuracy of 84%.
Original language | English (US) |
---|---|
Pages | 1145-1148 |
Number of pages | 4 |
State | Published - 2000 |
Externally published | Yes |
Event | 2000 IEEE International Conference on Multimedia and Expo (ICME 2000) - New York, NY, United States Duration: Jul 30 2000 → Aug 2 2000 |
Other
Other | 2000 IEEE International Conference on Multimedia and Expo (ICME 2000) |
---|---|
Country/Territory | United States |
City | New York, NY |
Period | 7/30/00 → 8/2/00 |
ASJC Scopus subject areas
- General Engineering