How to put it into words - Using random forests to extract symbol level descriptions from audio content for concept detection

Po Sen Huang, Robert Mertens, Ajay Divakaran, Gerald Friedland, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a system that uses symbolic representations of audio concepts as words for the descriptions of audio tracks, that enable it to go beyond the state of the art, which is audio event classification of a small number of audio classes in constrained settings, to large-scale classification in the wild. These audio words might be less meaningful for an annotator but they are descriptive for computer algorithms. We devise a random-forest vocabulary learning method with an audio word weighting scheme based on TF-IDF and TD-IDD, so as to combine the computational simplicity and accurate multi-class classification of the random forest with the data-driven discriminative power of the TF-IDF/TD-IDD methods. The proposed random forest clustering with text-retrieval methods significantly outperforms two state-of-the-art methods on the dry-run set and the full set of the TRECVID MED 2010 dataset.

Original languageEnglish (US)
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages505-508
Number of pages4
DOIs
StatePublished - Oct 23 2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: Mar 25 2012Mar 30 2012

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
CountryJapan
CityKyoto
Period3/25/123/30/12

Keywords

  • Audio Classification
  • Frequency
  • Inverse Document
  • Multimedia Event Detection
  • Random Forests
  • Term Frequency

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Huang, P. S., Mertens, R., Divakaran, A., Friedland, G., & Hasegawa-Johnson, M. (2012). How to put it into words - Using random forests to extract symbol level descriptions from audio content for concept detection. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (pp. 505-508). [6287927] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2012.6287927