Stochastic modeling of soundtrack for efficient segmentation and indexing of video

Milind R. Naphade, Thomas S. Huang

Research output: Contribution to journalConference articlepeer-review

Abstract

Tools for efficient and intelligent management of digital content are essential for digital video data management. An extremely challenging research area in this context is that of multimedia analysis and understanding. The capabilities of audio analysis in particular for video data management are yet to be fully exploited. We present a novel scheme for indexing and segmentation of video by analyzing the audio track. This analysis is then applied to the segmentation and indexing of movies. We build models for some interesting events in the motion picture soundtrack. The models built include music, human speech and silence. We propose the use of hidden Markov models to model the dynamics of the soundtrack and detect audio-events. Using these models we segment and index the soundtrack. A practical problem in motion picture soundtracks is that the audio in the track is of a composite nature. This corresponds to the mixing of sounds from different sources. Speech in foreground and music in background are common examples. The coexistence of multiple individual audio sources forces us to model such events explicitly. Experiments reveal that explicit modeling gives better results than modeling individual audio events separately.

Original languageEnglish (US)
Pages (from-to)168-176
Number of pages9
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume3972
StatePublished - 2000
EventProceedings of the 2000 'Storage and Retrieval for Media Databases 2000' - San Jose, CA, USA
Duration: Jan 26 2000Jan 28 2000

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Applied Mathematics
  • Electrical and Electronic Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Stochastic modeling of soundtrack for efficient segmentation and indexing of video'. Together they form a unique fingerprint.

Cite this