Audio scene segmentation using multiple features, models and time scales

Hari Sundaram, Shih Fu Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dominant sources and (c) a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.

Original languageEnglish (US)
Title of host publicationImage and Multidimensional Signal ProcessingMultimedia Signal Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2441-2444
Number of pages4
ISBN (Electronic)0780362934
DOIs
StatePublished - 2000
Externally publishedYes
Event25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000 - Istanbul, Turkey
Duration: Jun 5 2000Jun 9 2000

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume4
ISSN (Print)1520-6149

Other

Other25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
Country/TerritoryTurkey
CityIstanbul
Period6/5/006/9/00

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Audio scene segmentation using multiple features, models and time scales'. Together they form a unique fingerprint.

Cite this