Abstract
This paper discusses a methodology using a minimal set of sentences to adapt an existing TTS duration model to capture inter-speaker variations. The assumption is that the original duration database contains information of both language-specific and speaker-specific duration characteristics. In training a duration model for a new speaker, only the speaker-specific information needs to be modeled, therefore the size of the training data can be reduced drastically. Results from several experiments are compared and discussed.
Original language | English (US) |
---|---|
Pages | 81-86 |
Number of pages | 6 |
State | Published - 1998 |
Externally published | Yes |
Event | 3rd ESCA/COCOSDA Workshop on Speech Synthesis, SSW 1998 - Blue Mountains, Australia Duration: Nov 26 1998 → Nov 29 1998 |
Conference
Conference | 3rd ESCA/COCOSDA Workshop on Speech Synthesis, SSW 1998 |
---|---|
Country/Territory | Australia |
City | Blue Mountains |
Period | 11/26/98 → 11/29/98 |
ASJC Scopus subject areas
- Language and Linguistics
- Cultural Studies