Comparison of algorithms for speaker identification under adverse far-field recording conditions with extremely short utterances

Hao Tang, Zhixiong Chen, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we compare the state-of-the-art algorithms for text-independent speaker identification under adverse far-field recording conditions with extremely short training and testing utterances. The algorithms include both the generative and discriminative methods. For the generative methods, three variants of the original Gaussian Mixture Model (GMM) and the Universal Background Model adapted Gaussian Mixture Model (UBM-GMM) are involved. For the discriminative methods, two kernel-based algorithms, namely, the Support Vector Machine (SVM) and the Relevance Vector Machine (RVM), are considered. The comparison mainly focuses on the speaker identification accuracy and the speed of the individual algorithms (for both training and testing) as well as the sparseness of the resulting model for the kernel-based methods. Finally, we demonstrate through experiments that multi-channel fusion of the far-field recordings yields improved performance across all the above algorithms.

Original languageEnglish (US)
Title of host publicationProceedings of 2008 IEEE International Conference on Networking, Sensing and Control, ICNSC
Pages796-801
Number of pages6
DOIs
StatePublished - 2008
Externally publishedYes
Event2008 IEEE International Conference on Networking, Sensing and Control, ICNSC - Sanya, China
Duration: Apr 6 2008Apr 8 2008

Publication series

NameProceedings of 2008 IEEE International Conference on Networking, Sensing and Control, ICNSC

Other

Other2008 IEEE International Conference on Networking, Sensing and Control, ICNSC
Country/TerritoryChina
CitySanya
Period4/6/084/8/08

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Comparison of algorithms for speaker identification under adverse far-field recording conditions with extremely short utterances'. Together they form a unique fingerprint.

Cite this