TY - GEN
T1 - Generative model-based speaker clustering via mixture of von Mises-Fisher distributions
AU - Tang, Hao
AU - Chu, Stephen M.
AU - Huang, Thomas S.
PY - 2009
Y1 - 2009
N2 - This paper proposes a generative model-based speaker clustering algorithm in the maximum a posteriori adapted Gaussian mixture model (GMM) mean supervector space. The algorithm can be viewed as an extension of the standard expectation maximization algorithm for fitting a mixture model to the data, which iterates between two steps - a sample re-assignment step (E-step) and a model re-estimation step (M-step) - until it converges. The directional scattering patterns of GMM mean supervectors suggest that we employ a mixture of von Mises-Fisher distributions in the model re-estimation step. In the sample re-assignment step, four sampleto-mixture assignment strategies, namely soft, hard, stochastic, and deterministic annealing assignments, are used. Our experiments on the GALE Mandarin dataset show that the use of a mixture of von Mises-Fisher distributions as the underlying model yields signifi-cantly higher speaker clustering accuracies than the use of a mixture of Gaussian distributions. It is further shown that deterministic annealing assignment outperforms soft assignment, that soft assignment is comparable to stochastic assignment, and that both soft and stochastic assignments outperform hard assignment.
AB - This paper proposes a generative model-based speaker clustering algorithm in the maximum a posteriori adapted Gaussian mixture model (GMM) mean supervector space. The algorithm can be viewed as an extension of the standard expectation maximization algorithm for fitting a mixture model to the data, which iterates between two steps - a sample re-assignment step (E-step) and a model re-estimation step (M-step) - until it converges. The directional scattering patterns of GMM mean supervectors suggest that we employ a mixture of von Mises-Fisher distributions in the model re-estimation step. In the sample re-assignment step, four sampleto-mixture assignment strategies, namely soft, hard, stochastic, and deterministic annealing assignments, are used. Our experiments on the GALE Mandarin dataset show that the use of a mixture of von Mises-Fisher distributions as the underlying model yields signifi-cantly higher speaker clustering accuracies than the use of a mixture of Gaussian distributions. It is further shown that deterministic annealing assignment outperforms soft assignment, that soft assignment is comparable to stochastic assignment, and that both soft and stochastic assignments outperform hard assignment.
KW - EM algorithm
KW - GMM mean supervectors
KW - Mixture of von Mises-Fisher distributions
KW - Model-based clustering
UR - http://www.scopus.com/inward/record.url?scp=70349220967&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349220967&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2009.4960530
DO - 10.1109/ICASSP.2009.4960530
M3 - Conference contribution
AN - SCOPUS:70349220967
SN - 9781424423545
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4101
EP - 4104
BT - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
T2 - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Y2 - 19 April 2009 through 24 April 2009
ER -