An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources

Xiong Xiao, Shengkui Zhao, Thi Ngoc Tho Nguyen, Douglas L Jones, Eng Siong Chng, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6330-6334
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period3/20/163/25/16

Fingerprint

Direction of arrival
Eigenvalues and eigenfunctions
Bins
Microphones
Covariance matrix
Mean square error
Distribution functions

Keywords

  • direction of arrival
  • eigenvector clustering
  • expectation-maximization
  • microphone arrays
  • spatial covariance

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Xiao, X., Zhao, S., Nguyen, T. N. T., Jones, D. L., Chng, E. S., & Li, H. (2016). An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (pp. 6330-6334). [7472895] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2016-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472895

An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources. / Xiao, Xiong; Zhao, Shengkui; Nguyen, Thi Ngoc Tho; Jones, Douglas L; Chng, Eng Siong; Li, Haizhou.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 6330-6334 7472895 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2016-May).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xiao, X, Zhao, S, Nguyen, TNT, Jones, DL, Chng, ES & Li, H 2016, An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings., 7472895, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2016-May, Institute of Electrical and Electronics Engineers Inc., pp. 6330-6334, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 3/20/16. https://doi.org/10.1109/ICASSP.2016.7472895
Xiao X, Zhao S, Nguyen TNT, Jones DL, Chng ES, Li H. An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 6330-6334. 7472895. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2016.7472895
Xiao, Xiong ; Zhao, Shengkui ; Nguyen, Thi Ngoc Tho ; Jones, Douglas L ; Chng, Eng Siong ; Li, Haizhou. / An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 6330-6334 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{0fe2334d60834e6cbe767bf21f4d4569,
title = "An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources",
abstract = "This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.",
keywords = "direction of arrival, eigenvector clustering, expectation-maximization, microphone arrays, spatial covariance",
author = "Xiong Xiao and Shengkui Zhao and Nguyen, {Thi Ngoc Tho} and Jones, {Douglas L} and Chng, {Eng Siong} and Haizhou Li",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7472895",
language = "English (US)",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "6330--6334",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
address = "United States",

}

TY - GEN

T1 - An expectation-maximization eigenvector clustering approach to direction of arrival estimation of multiple speech sources

AU - Xiao, Xiong

AU - Zhao, Shengkui

AU - Nguyen, Thi Ngoc Tho

AU - Jones, Douglas L

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2016/5/18

Y1 - 2016/5/18

N2 - This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.

AB - This paper presents an eigenvector clustering approach for estimating the direction of arrival (DOA) of multiple speech signals using a microphone array. Existing clustering approaches usually only use low frequencies to avoid spatial aliasing. In this study, we propose a probabilistic eigenvector clustering approach to use all frequencies. In our work, time-frequency (TF) bins dominated by only one source are first detected using a combination of noise-floor tracking, onset detection and coherence test. For each selected TF bin, the largest eigenvector of its spatial covariance matrix is extracted for clustering. A mixture density model is introduced to model the distribution of the eigenvectors, where each component distribution corresponds to one source and is parameterized by the source DOA. To use eigenvectors of all frequencies, the steering vectors of all frequencies of the sources are used in the distribution function. The DOAs of the sources can be estimated by maximizing the likelihood of the eigenvectors using an expectation-maximization (EM) algorithm. Simulation and experimental results show that the proposed approach significantly improves the root-mean-square error (RMSE) for DOA estimation of multiple speech sources compared to the MUSIC algorithm implemented on the single-source dominated TF bins and our previous clustering approach.

KW - direction of arrival

KW - eigenvector clustering

KW - expectation-maximization

KW - microphone arrays

KW - spatial covariance

UR - http://www.scopus.com/inward/record.url?scp=84973343814&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973343814&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7472895

DO - 10.1109/ICASSP.2016.7472895

M3 - Conference contribution

AN - SCOPUS:84973343814

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 6330

EP - 6334

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -