TY - GEN
T1 - How to put it into words - Using random forests to extract symbol level descriptions from audio content for concept detection
AU - Huang, Po Sen
AU - Mertens, Robert
AU - Divakaran, Ajay
AU - Friedland, Gerald
AU - Hasegawa-Johnson, Mark
PY - 2012
Y1 - 2012
N2 - This paper presents a system that uses symbolic representations of audio concepts as words for the descriptions of audio tracks, that enable it to go beyond the state of the art, which is audio event classification of a small number of audio classes in constrained settings, to large-scale classification in the wild. These audio words might be less meaningful for an annotator but they are descriptive for computer algorithms. We devise a random-forest vocabulary learning method with an audio word weighting scheme based on TF-IDF and TD-IDD, so as to combine the computational simplicity and accurate multi-class classification of the random forest with the data-driven discriminative power of the TF-IDF/TD-IDD methods. The proposed random forest clustering with text-retrieval methods significantly outperforms two state-of-the-art methods on the dry-run set and the full set of the TRECVID MED 2010 dataset.
AB - This paper presents a system that uses symbolic representations of audio concepts as words for the descriptions of audio tracks, that enable it to go beyond the state of the art, which is audio event classification of a small number of audio classes in constrained settings, to large-scale classification in the wild. These audio words might be less meaningful for an annotator but they are descriptive for computer algorithms. We devise a random-forest vocabulary learning method with an audio word weighting scheme based on TF-IDF and TD-IDD, so as to combine the computational simplicity and accurate multi-class classification of the random forest with the data-driven discriminative power of the TF-IDF/TD-IDD methods. The proposed random forest clustering with text-retrieval methods significantly outperforms two state-of-the-art methods on the dry-run set and the full set of the TRECVID MED 2010 dataset.
KW - Audio Classification
KW - Frequency
KW - Inverse Document
KW - Multimedia Event Detection
KW - Random Forests
KW - Term Frequency
UR - http://www.scopus.com/inward/record.url?scp=84867619502&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867619502&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2012.6287927
DO - 10.1109/ICASSP.2012.6287927
M3 - Conference contribution
AN - SCOPUS:84867619502
SN - 9781467300469
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 505
EP - 508
BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Y2 - 25 March 2012 through 30 March 2012
ER -