TY - GEN
T1 - Efficient neighborhood-based topic modeling for collaborative audio enhancement on massive crowdsourced recordings
AU - Kim, Minje
AU - Smaragdis, Paris
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grant No. 1319708.
Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - Collaborative Audio Enhancement (CAE) aims at separating a dominant source from crowdsourced recordings of a scene. This paper proposes a CAE setup as a big ad-hoc microphone array problem, assuming hundreds of sensors scattered over a large scene, e.g. a concert hall or a street riot. An important characteristic in such cases is the fact that not all sensors capture useful information, mainly because of the existence of strong local noise interferences and recording artifacts. This renders traditional array processing techniques inadequate for tasks such as source enhancement. One way to recover the most common source while suppressing recording-specific interference, is to share latent components across simultaneous models on multiple magnitude spectrograms. The proposed method improves on the quality and the computational requirements of such a model by using a two-stage nearest-neighborhood search at every EM update. Its optional first-round search uses Hamming distance between hashed spectrograms to quickly find a redundant candidate set, and then a subsequent step narrows the set down to a subset using more appropriate cross entropy. Experimental results show that the proposed neighborhood schemes converge to the better quality solutions faster than the comprehensive model using all data.
AB - Collaborative Audio Enhancement (CAE) aims at separating a dominant source from crowdsourced recordings of a scene. This paper proposes a CAE setup as a big ad-hoc microphone array problem, assuming hundreds of sensors scattered over a large scene, e.g. a concert hall or a street riot. An important characteristic in such cases is the fact that not all sensors capture useful information, mainly because of the existence of strong local noise interferences and recording artifacts. This renders traditional array processing techniques inadequate for tasks such as source enhancement. One way to recover the most common source while suppressing recording-specific interference, is to share latent components across simultaneous models on multiple magnitude spectrograms. The proposed method improves on the quality and the computational requirements of such a model by using a two-stage nearest-neighborhood search at every EM update. Its optional first-round search uses Hamming distance between hashed spectrograms to quickly find a redundant candidate set, and then a subsequent step narrows the set down to a subset using more appropriate cross entropy. Experimental results show that the proposed neighborhood schemes converge to the better quality solutions faster than the comprehensive model using all data.
KW - Ad-hoc Microphone Array
KW - Collaborative Audio Enhancement
KW - Probabilistic Latent Component Sharing
KW - Probabilistic Topic Models
KW - Social Data
UR - http://www.scopus.com/inward/record.url?scp=84973380253&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973380253&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7471633
DO - 10.1109/ICASSP.2016.7471633
M3 - Conference contribution
AN - SCOPUS:84973380253
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 41
EP - 45
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -