Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection

Kai Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King, Mark Hasegawa-Johnson, Thomas S. Huang

Research output: Contribution to journalArticlepeer-review

Abstract

Browsing large audio archives is challenging because of the limitations of human audition and attention. However, this task becomes easier with a suitable visualization of the audio signal, such as a spectrogram transformed to make unusual audio events salient. This transformation maximizes the mutual information between an isolated event's spectrogram and an estimate of how salient the event appears in its surrounding context. When such spectrograms are computed and displayed with fluid zooming over many temporal orders of magnitude, sparse events in long audio recordings can be detected more quickly and more easily. In particular, in a 1/10-real-time acoustic event detection task, subjects who were shown saliency-maximized rather than conventional spectrograms performed significantly better. Saliency maximization also improves the mutual information between the ground truth of nonbackground sounds and visual saliency, more than other common enhancements to visualization.

Original languageEnglish (US)
Article number26
JournalACM Transactions on Applied Perception
Volume10
Issue number4
DOIs
StatePublished - Oct 2013

Keywords

  • Acoustic event detection
  • Audio visualization
  • Visual salience/saliency

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)
  • Experimental and Cognitive Psychology

Fingerprint

Dive into the research topics of 'Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection'. Together they form a unique fingerprint.

Cite this