Generative model-based speaker clustering via mixture of von Mises-Fisher distributions

Hao Tang, Stephen M. Chu, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a generative model-based speaker clustering algorithm in the maximum a posteriori adapted Gaussian mixture model (GMM) mean supervector space. The algorithm can be viewed as an extension of the standard expectation maximization algorithm for fitting a mixture model to the data, which iterates between two steps - a sample re-assignment step (E-step) and a model re-estimation step (M-step) - until it converges. The directional scattering patterns of GMM mean supervectors suggest that we employ a mixture of von Mises-Fisher distributions in the model re-estimation step. In the sample re-assignment step, four sampleto-mixture assignment strategies, namely soft, hard, stochastic, and deterministic annealing assignments, are used. Our experiments on the GALE Mandarin dataset show that the use of a mixture of von Mises-Fisher distributions as the underlying model yields signifi-cantly higher speaker clustering accuracies than the use of a mixture of Gaussian distributions. It is further shown that deterministic annealing assignment outperforms soft assignment, that soft assignment is comparable to stochastic assignment, and that both soft and stochastic assignments outperform hard assignment.

Original languageEnglish (US)
Title of host publication2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Pages4101-4104
Number of pages4
DOIs
StatePublished - Sep 23 2009
Event2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan, Province of China
Duration: Apr 19 2009Apr 24 2009

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Country/TerritoryTaiwan, Province of China
CityTaipei
Period4/19/094/24/09

Keywords

  • EM algorithm
  • GMM mean supervectors
  • Mixture of von Mises-Fisher distributions
  • Model-based clustering

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Generative model-based speaker clustering via mixture of von Mises-Fisher distributions'. Together they form a unique fingerprint.

Cite this