Collaborative Audio Enhancement (CAE) aims at separating a dominant source from crowdsourced recordings of a scene. This paper proposes a CAE setup as a big ad-hoc microphone array problem, assuming hundreds of sensors scattered over a large scene, e.g. a concert hall or a street riot. An important characteristic in such cases is the fact that not all sensors capture useful information, mainly because of the existence of strong local noise interferences and recording artifacts. This renders traditional array processing techniques inadequate for tasks such as source enhancement. One way to recover the most common source while suppressing recording-specific interference, is to share latent components across simultaneous models on multiple magnitude spectrograms. The proposed method improves on the quality and the computational requirements of such a model by using a two-stage nearest-neighborhood search at every EM update. Its optional first-round search uses Hamming distance between hashed spectrograms to quickly find a redundant candidate set, and then a subsequent step narrows the set down to a subset using more appropriate cross entropy. Experimental results show that the proposed neighborhood schemes converge to the better quality solutions faster than the comprehensive model using all data.