TY - GEN
T1 - Audio scene segmentation using multiple features, models and time scales
AU - Sundaram, Hari
AU - Chang, Shih Fu
N1 - Publisher Copyright:
© 2000 IEEE.
PY - 2000
Y1 - 2000
N2 - In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dominant sources and (c) a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.
AB - In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dominant sources and (c) a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.
UR - http://www.scopus.com/inward/record.url?scp=0033677049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033677049&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2000.859335
DO - 10.1109/ICASSP.2000.859335
M3 - Conference contribution
AN - SCOPUS:0033677049
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2441
EP - 2444
BT - Image and Multidimensional Signal ProcessingMultimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
Y2 - 5 June 2000 through 9 June 2000
ER -