TY - GEN
T1 - Computer vision for music identification
AU - Ke, Yan
AU - Hoiem, Derek
AU - Sukthankar, Rahul
PY - 2005
Y1 - 2005
N2 - We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based "occlusion" model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality and significant ambient noise. Our experiments demonstrate that this approach significantly outperforms the current state-of-the-art in content-based music identification.
AB - We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based "occlusion" model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality and significant ambient noise. Our experiments demonstrate that this approach significantly outperforms the current state-of-the-art in content-based music identification.
UR - http://www.scopus.com/inward/record.url?scp=33745127045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745127045&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2005.105
DO - 10.1109/CVPR.2005.105
M3 - Conference contribution
AN - SCOPUS:33745127045
SN - 0769523722
SN - 9780769523729
T3 - Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005
SP - 597
EP - 604
BT - Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005
PB - IEEE Computer Society
T2 - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005
Y2 - 20 June 2005 through 25 June 2005
ER -