TY - GEN
T1 - Unsupervised learning of HMM topology for text-dependent speaker verification
AU - Liu, Ming
AU - Huang, Thomas
PY - 2006
Y1 - 2006
N2 - Usually, text-dependent speaker verification can achieve better performance than text-independent system because of the constraint that the enrollment and testing utterance share the same phonetic content. However, the enrollment data for text-dependent system usually is very limited. Expectation Maximization(EM) training of HMM will suffer from noisy estimation because of limited enrollment. Adaptation is a popular solution in this scenario. The target model is formed by adapting the generic model based on limited speaker specific training data. Although the adaptation scheme can tolerate much less training data than direct EM method, the traditional method does not account the topology of HMM might be different for different speaker. The topology information further distinguish the target speaker from impostors. In this paper, we propose a unsupervised learning method to learn the topology of HMM for each speaker. The experimental results indicate that with learning the topology, the framework is more effective than traditional adaptation methods. In the pure acoustic matching experiments, the proposed method is the best system under extremely small amount enrollment data(1 training utterance) and moderate training data. That mainly due to explicitly including the label information in background modeling and discriminant capability of unsupervised learning of HMM topology.
AB - Usually, text-dependent speaker verification can achieve better performance than text-independent system because of the constraint that the enrollment and testing utterance share the same phonetic content. However, the enrollment data for text-dependent system usually is very limited. Expectation Maximization(EM) training of HMM will suffer from noisy estimation because of limited enrollment. Adaptation is a popular solution in this scenario. The target model is formed by adapting the generic model based on limited speaker specific training data. Although the adaptation scheme can tolerate much less training data than direct EM method, the traditional method does not account the topology of HMM might be different for different speaker. The topology information further distinguish the target speaker from impostors. In this paper, we propose a unsupervised learning method to learn the topology of HMM for each speaker. The experimental results indicate that with learning the topology, the framework is more effective than traditional adaptation methods. In the pure acoustic matching experiments, the proposed method is the best system under extremely small amount enrollment data(1 training utterance) and moderate training data. That mainly due to explicitly including the label information in background modeling and discriminant capability of unsupervised learning of HMM topology.
KW - HMM topology
KW - Speaker verification
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=44949136622&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=44949136622&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:44949136622
SN - 9781604234497
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 921
EP - 924
BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PB - International Speech Communication Association
T2 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Y2 - 17 September 2006 through 21 September 2006
ER -