Fishervoice and semi-supervised speaker clustering

Stephen M. Chu, Hao Tang, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Speaker subspace modeling has become increasingly important in speaker recognition, diarization, and clustering. Principal component analysis (PCA) is a popular linear subspace learning technique and the approach that represents an arbitrary utterance or speaker as a linear combination of a set of basis voices based on PCA is known as the eigenvoice approach. In this paper, a novel technique, namely the fishervoice approach, is proposed. The fishervoice approach is based on linear discriminant analysis, another successful linear subspace learning technique that provides an optimized low-dimensional representation of utterances or speakers with focus on the most discriminative basis voices. We apply the fishervoice approach to speaker clustering in a semi-supervised manner and show that the fishervoice approach significantly outperforms the eigenvoice approach in all our experiments on the GALE Mandarin dataset.

Original languageEnglish (US)
Title of host publication2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Number of pages4
StatePublished - 2009
Externally publishedYes
Event2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan, Province of China
Duration: Apr 19 2009Apr 24 2009

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149


Other2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Country/TerritoryTaiwan, Province of China


  • Eigenvoice
  • Fisher-voice
  • Linear subspace learning
  • Semi-supervised speaker clustering

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Fishervoice and semi-supervised speaker clustering'. Together they form a unique fingerprint.

Cite this