Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization

Kai Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King, Mark Hasegawa-Johnson, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a saliency-maximized audio spectrogram as a representation that lets human analysts quickly search for and detect events in audio recordings. By rendering target events as visually salient patterns, this representation minimizes the time and effort needed to examine a recording. In particular, we propose a transformation of a conventional spectrogram that maximizes the mutual information between the spectrograms of isolated target events and the estimated saliency of the overall visual representation. When subjects are shown spectrograms that are saliency-maximized, they perform significantly better in a 1/10-real-time acoustic event detection task.

Original languageEnglish (US)
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages2277-2280
Number of pages4
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: Mar 25 2012Mar 30 2012

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Country/TerritoryJapan
CityKyoto
Period3/25/123/30/12

Keywords

  • acoustic event detection
  • audio visualization
  • visual saliency

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization'. Together they form a unique fingerprint.

Cite this