Abstract
This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter,we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics.The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations. We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation.Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The vocal folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction. Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems.
Original language | English (US) |
---|---|
Pages (from-to) | 1-118 |
Number of pages | 118 |
Journal | Synthesis Lectures on Speech and Audio Processing |
Volume | 9 |
DOIs | |
State | Published - Jul 19 2012 |
Keywords
- Navier-Stokes equations
- articulatory dynamics
- articulatory speech synthesis
- computational fluid dynamics
- human vocal apparatus
ASJC Scopus subject areas
- Signal Processing
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering