TY - GEN
T1 - Exploring discriminative learning for text-independent speaker recognition
AU - Liu, Ming
AU - Zhang, Zhengyou
AU - Hasegawa-Johnson, Mark
AU - Huang, Thomas S.
PY - 2007
Y1 - 2007
N2 - Speaker verification is a technology of verifying the claimed identity of a speaker based on the speech signal from the speaker (voice print). To learn the score of similarity between each pair of target and trial utterances, we investigated two different discriminative learning frameworks: fisher mapping followed by SVM learning and utterance transform followed by Iterative Cohort Modeling (ICM). In both methods, a mapping is applied to map speech utterance from a variable-length acoustic feature sequence into a fixed dimensional vector. SVM learning constructs a classifier in the mapped vector space for speaker verification. ICM learns a metric in this vector space by incorporating discriminative learning methods. The obtained metric is then used by a Nearest Neighbor classifier for speaker verification. The experiments conducted on NIST02 corpus show that both discriminative learning methods outperform the base-line GMM-UBM system. Furthermore, we observe that the ICM-based method is more effective than the SVM-based method, indicating that the metric learning scheme is more powerful in constructing a better metric in the mapped vector space.
AB - Speaker verification is a technology of verifying the claimed identity of a speaker based on the speech signal from the speaker (voice print). To learn the score of similarity between each pair of target and trial utterances, we investigated two different discriminative learning frameworks: fisher mapping followed by SVM learning and utterance transform followed by Iterative Cohort Modeling (ICM). In both methods, a mapping is applied to map speech utterance from a variable-length acoustic feature sequence into a fixed dimensional vector. SVM learning constructs a classifier in the mapped vector space for speaker verification. ICM learns a metric in this vector space by incorporating discriminative learning methods. The obtained metric is then used by a Nearest Neighbor classifier for speaker verification. The experiments conducted on NIST02 corpus show that both discriminative learning methods outperform the base-line GMM-UBM system. Furthermore, we observe that the ICM-based method is more effective than the SVM-based method, indicating that the metric learning scheme is more powerful in constructing a better metric in the mapped vector space.
UR - http://www.scopus.com/inward/record.url?scp=46449120364&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=46449120364&partnerID=8YFLogxK
U2 - 10.1109/icme.2007.4284585
DO - 10.1109/icme.2007.4284585
M3 - Conference contribution
AN - SCOPUS:46449120364
SN - 1424410177
SN - 9781424410170
T3 - Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
SP - 56
EP - 59
BT - Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
PB - IEEE Computer Society
T2 - IEEE International Conference onMultimedia and Expo, ICME 2007
Y2 - 2 July 2007 through 5 July 2007
ER -