TY - GEN
T1 - Comparison of algorithms for speaker identification under adverse far-field recording conditions with extremely short utterances
AU - Tang, Hao
AU - Chen, Zhixiong
AU - Huang, Thomas S.
PY - 2008
Y1 - 2008
N2 - In this paper, we compare the state-of-the-art algorithms for text-independent speaker identification under adverse far-field recording conditions with extremely short training and testing utterances. The algorithms include both the generative and discriminative methods. For the generative methods, three variants of the original Gaussian Mixture Model (GMM) and the Universal Background Model adapted Gaussian Mixture Model (UBM-GMM) are involved. For the discriminative methods, two kernel-based algorithms, namely, the Support Vector Machine (SVM) and the Relevance Vector Machine (RVM), are considered. The comparison mainly focuses on the speaker identification accuracy and the speed of the individual algorithms (for both training and testing) as well as the sparseness of the resulting model for the kernel-based methods. Finally, we demonstrate through experiments that multi-channel fusion of the far-field recordings yields improved performance across all the above algorithms.
AB - In this paper, we compare the state-of-the-art algorithms for text-independent speaker identification under adverse far-field recording conditions with extremely short training and testing utterances. The algorithms include both the generative and discriminative methods. For the generative methods, three variants of the original Gaussian Mixture Model (GMM) and the Universal Background Model adapted Gaussian Mixture Model (UBM-GMM) are involved. For the discriminative methods, two kernel-based algorithms, namely, the Support Vector Machine (SVM) and the Relevance Vector Machine (RVM), are considered. The comparison mainly focuses on the speaker identification accuracy and the speed of the individual algorithms (for both training and testing) as well as the sparseness of the resulting model for the kernel-based methods. Finally, we demonstrate through experiments that multi-channel fusion of the far-field recordings yields improved performance across all the above algorithms.
UR - http://www.scopus.com/inward/record.url?scp=49249137203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49249137203&partnerID=8YFLogxK
U2 - 10.1109/ICNSC.2008.4525324
DO - 10.1109/ICNSC.2008.4525324
M3 - Conference contribution
AN - SCOPUS:49249137203
SN - 9781424416851
T3 - Proceedings of 2008 IEEE International Conference on Networking, Sensing and Control, ICNSC
SP - 796
EP - 801
BT - Proceedings of 2008 IEEE International Conference on Networking, Sensing and Control, ICNSC
T2 - 2008 IEEE International Conference on Networking, Sensing and Control, ICNSC
Y2 - 6 April 2008 through 8 April 2008
ER -