Fishervoice and semi-supervised speaker clustering

Stephen M. Chu, Hao Tang, Thomas S Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Speaker subspace modeling has become increasingly important in speaker recognition, diarization, and clustering. Principal component analysis (PCA) is a popular linear subspace learning technique and the approach that represents an arbitrary utterance or speaker as a linear combination of a set of basis voices based on PCA is known as the eigenvoice approach. In this paper, a novel technique, namely the fishervoice approach, is proposed. The fishervoice approach is based on linear discriminant analysis, another successful linear subspace learning technique that provides an optimized low-dimensional representation of utterances or speakers with focus on the most discriminative basis voices. We apply the fishervoice approach to speaker clustering in a semi-supervised manner and show that the fishervoice approach significantly outperforms the eigenvoice approach in all our experiments on the GALE Mandarin dataset.

Original languageEnglish (US)
Title of host publication2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Pages4089-4092
Number of pages4
DOIs
StatePublished - Sep 23 2009
Event2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan, Province of China
Duration: Apr 19 2009Apr 24 2009

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
CountryTaiwan, Province of China
CityTaipei
Period4/19/094/24/09

Keywords

  • Eigenvoice
  • Fisher-voice
  • Linear subspace learning
  • Semi-supervised speaker clustering

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Fishervoice and semi-supervised speaker clustering'. Together they form a unique fingerprint.

Cite this