TY - JOUR
T1 - Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network
AU - Nguyen, Thi Ngoc Tho
AU - Gan, Woon Seng
AU - Ranjan, Rishabh
AU - Jones, Douglas L.
N1 - Funding Information:
Manuscript received February 14, 2020; revised June 22, 2020; accepted August 10, 2020. Date of publication August 26, 2020; date of current version September 16, 2020. This work was supported by Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU), which is a collaboration between Singapore Telecommunications Limited (Singtel) and Nanyang Technological University (NTU) that is funded by the Singapore Government through the Industry Alignment Fund – Industry Collaboration Projects Grant. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Roland Badeau. (Corresponding author: Thi Ngoc Tho Nguyen.) Thi Ngoc Tho Nguyen, Woon-Seng Gan, and Rishabh Ranjan are with the Department of Electrical and Electronic Engineering, Nanyang Technological University, 639798 Singapore, Singapore (e-mail: nguyenth003@e.ntu.edu.sg; ewsgan@ntu.edu.sg; rishabh001@e.ntu.edu.sg).
Publisher Copyright:
© 2014 IEEE.
PY - 2020
Y1 - 2020
N2 - Many signal processing-based methods for sound source direction-of-Arrival estimation produce a spatial pseudo-spectrum of which the local maxima strongly indicate the source directions. Due to different levels of noise, reverberation and different number of overlapping sources, the spatial pseudo-spectra are noisy even after smoothing. In addition, the number of sources is often unknown. As a result, selecting the peaks from these spectra is susceptible to error. Convolutional neural network has been successfully applied to many image processing problems in general and direction-of-Arrival estimation in particular. In addition, deep learning-based methods for direction-of-Arrival estimation show good generalization to different environments. We propose to use a 2D convolutional neural network with multi-Task learning to robustly estimate the number of sources and the directions-of-Arrival from short-Time spatial pseudo-spectra, which have useful directional information from audio input signals. This approach reduces the tendency of the neural network to learn unwanted association between sound classes and directional information, and helps the network generalize to unseen sound classes. The simulation and experimental results show that the proposed methods outperform other directional-of-Arrival estimation methods in different levels of noise and reverberation, and different number of sources.
AB - Many signal processing-based methods for sound source direction-of-Arrival estimation produce a spatial pseudo-spectrum of which the local maxima strongly indicate the source directions. Due to different levels of noise, reverberation and different number of overlapping sources, the spatial pseudo-spectra are noisy even after smoothing. In addition, the number of sources is often unknown. As a result, selecting the peaks from these spectra is susceptible to error. Convolutional neural network has been successfully applied to many image processing problems in general and direction-of-Arrival estimation in particular. In addition, deep learning-based methods for direction-of-Arrival estimation show good generalization to different environments. We propose to use a 2D convolutional neural network with multi-Task learning to robustly estimate the number of sources and the directions-of-Arrival from short-Time spatial pseudo-spectra, which have useful directional information from audio input signals. This approach reduces the tendency of the neural network to learn unwanted association between sound classes and directional information, and helps the network generalize to unseen sound classes. The simulation and experimental results show that the proposed methods outperform other directional-of-Arrival estimation methods in different levels of noise and reverberation, and different number of sources.
KW - Direction-of-Arrival estimation
KW - convolutional neural network
KW - multi-Task learning
KW - multiple sound sources
KW - spatial pseudo-spectrum
UR - http://www.scopus.com/inward/record.url?scp=85090241558&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090241558&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.3019646
DO - 10.1109/TASLP.2020.3019646
M3 - Article
AN - SCOPUS:85090241558
VL - 28
SP - 2626
EP - 2637
JO - IEEE/ACM Transactions on Speech and Language Processing
JF - IEEE/ACM Transactions on Speech and Language Processing
SN - 2329-9290
M1 - 9178434
ER -