TY - JOUR
T1 - Direction of Arrival with One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization
AU - El Badawy, Dalia
AU - Dokmanic, Ivan
N1 - Manuscript received February 14, 2018; revised June 14, 2018 and August 7, 2018; accepted August 20, 2018. Date of publication August 24, 2018; date of current version September 10, 2018. This work was supported by the Swiss National Science Foundation under Grant 20FP-1 151073, Inverse Problems regularized by Sparsity. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Augusto Sarti. (Corresponding author: Dalia El Badawy.) D. El Badawy is with the École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland (e-mail:,[email protected]).
PY - 2018/12
Y1 - 2018/12
N2 - Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
AB - Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
KW - Direction-of-arrival estimation
KW - group sparsity
KW - monaural localization
KW - non-negative matrix factorization
KW - sound scattering
KW - universal speech model
UR - https://www.scopus.com/pages/publications/85052699536
UR - https://www.scopus.com/inward/citedby.url?scp=85052699536&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2018.2867081
DO - 10.1109/TASLP.2018.2867081
M3 - Article
AN - SCOPUS:85052699536
SN - 2329-9290
VL - 26
SP - 2436
EP - 2446
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
IS - 12
M1 - 8445656
ER -