TY - GEN
T1 - An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources
AU - Xiao, Xiong
AU - Zhao, Shengkui
AU - Nguyen, Thi Ngoc Tho
AU - Jones, Douglas L.
AU - Chng, Eng Siong
AU - Li, Haizhou
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.
AB - This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.
KW - direction of arrival
KW - eigenvector clustering
KW - expectation-maximization
KW - microphone arrays
KW - spatial covariance
UR - http://www.scopus.com/inward/record.url?scp=84973343814&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973343814&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7472895
DO - 10.1109/ICASSP.2016.7472895
M3 - Conference contribution
AN - SCOPUS:84973343814
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6330
EP - 6334
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -