Frequency domain correspondence for speaker normalization

Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, Zhengyou Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Due to physiology and linguistic difference between speakers, the spectrum pattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, the inter-speaker variation will reduce the modeling efficiency and result in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of these two speakers. To establish the frequency domain correspondence, we formulate the task as an optimal matching problem. The local matching is achieved by comparing the local features of the spectrogram along the frequency bins. This local matching is actually capturing the similarity of the local patterns along different frequency bins in the spectrogram. After the local matching, a dynamic programming is then applied to find the global optimal alignment between two frequency axes. Experiments on TIDIGITS and TIMIT clearly show the effectiveness of this method.

Original languageEnglish (US)
Title of host publication8th Annual Conference of the International Speech Communication Association (Interspeech 2007)
PublisherInternational Speech Communication Association
Pages45-48
Number of pages4
ISBN (Print)9781605603162
StatePublished - 2007
Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp, Belgium
Duration: Aug 27 2007Aug 31 2007

Publication series

NameInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Volume1
ISSN (Electronic)1990-9772

Other

Other8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Country/TerritoryBelgium
CityAntwerp
Period8/27/078/31/07

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Modeling and Simulation
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'Frequency domain correspondence for speaker normalization'. Together they form a unique fingerprint.

Cite this