TY - GEN
T1 - Frequency domain correspondence for speaker normalization
AU - Liu, Ming
AU - Zhou, Xi
AU - Hasegawa-Johnson, Mark
AU - Huang, Thomas S.
AU - Zhang, Zhengyou
PY - 2007
Y1 - 2007
N2 - Due to physiology and linguistic difference between speakers, the spectrum pattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, the inter-speaker variation will reduce the modeling efficiency and result in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of these two speakers. To establish the frequency domain correspondence, we formulate the task as an optimal matching problem. The local matching is achieved by comparing the local features of the spectrogram along the frequency bins. This local matching is actually capturing the similarity of the local patterns along different frequency bins in the spectrogram. After the local matching, a dynamic programming is then applied to find the global optimal alignment between two frequency axes. Experiments on TIDIGITS and TIMIT clearly show the effectiveness of this method.
AB - Due to physiology and linguistic difference between speakers, the spectrum pattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, the inter-speaker variation will reduce the modeling efficiency and result in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of these two speakers. To establish the frequency domain correspondence, we formulate the task as an optimal matching problem. The local matching is achieved by comparing the local features of the spectrogram along the frequency bins. This local matching is actually capturing the similarity of the local patterns along different frequency bins in the spectrogram. After the local matching, a dynamic programming is then applied to find the global optimal alignment between two frequency axes. Experiments on TIDIGITS and TIMIT clearly show the effectiveness of this method.
UR - http://www.scopus.com/inward/record.url?scp=56149118252&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56149118252&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:56149118252
SN - 9781605603162
T3 - International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
SP - 45
EP - 48
BT - 8th Annual Conference of the International Speech Communication Association (Interspeech 2007)
PB - International Speech Communication Association
T2 - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Y2 - 27 August 2007 through 31 August 2007
ER -