Articulatory speech synthesis from the fluid Dynamics of the vocal apparatus

Stephen Levinson, Don Davis, Scot Slimon, Jun Huang

Research output: Contribution to journalArticle

Abstract

This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter,we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics.The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations. We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation.Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The vocal folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction. Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems.

Original languageEnglish (US)
Pages (from-to)1-118
Number of pages118
JournalSynthesis Lectures on Speech and Audio Processing
Volume9
DOIs
StatePublished - Jul 19 2012

Fingerprint

Speech synthesis
fluid dynamics
Fluid dynamics
Acoustic waves
synthesis
filters
acoustics
Speech analysis
Air
Parameterization
compressible fluids
phonetics
Navier Stokes equations
sound propagation
air
Aerodynamics
Signal processing
parameterization
Physics
aerodynamics

Keywords

  • Navier-Stokes equations
  • articulatory dynamics
  • articulatory speech synthesis
  • computational fluid dynamics
  • human vocal apparatus

ASJC Scopus subject areas

  • Signal Processing
  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this

Articulatory speech synthesis from the fluid Dynamics of the vocal apparatus. / Levinson, Stephen; Davis, Don; Slimon, Scot; Huang, Jun.

In: Synthesis Lectures on Speech and Audio Processing, Vol. 9, 19.07.2012, p. 1-118.

Research output: Contribution to journalArticle

@article{416aeda047dc49eb9c3c7a696a011a33,
title = "Articulatory speech synthesis from the fluid Dynamics of the vocal apparatus",
abstract = "This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter,we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics.The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations. We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation.Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The vocal folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction. Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems.",
keywords = "Navier-Stokes equations, articulatory dynamics, articulatory speech synthesis, computational fluid dynamics, human vocal apparatus",
author = "Stephen Levinson and Don Davis and Scot Slimon and Jun Huang",
year = "2012",
month = "7",
day = "19",
doi = "10.2200/S00398ED1V01Y201112SAP009",
language = "English (US)",
volume = "9",
pages = "1--118",
journal = "Synthesis Lectures on Speech and Audio Processing",
issn = "1932-121X",
publisher = "Morgan and Claypool Publishers",

}

TY - JOUR

T1 - Articulatory speech synthesis from the fluid Dynamics of the vocal apparatus

AU - Levinson, Stephen

AU - Davis, Don

AU - Slimon, Scot

AU - Huang, Jun

PY - 2012/7/19

Y1 - 2012/7/19

N2 - This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter,we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics.The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations. We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation.Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The vocal folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction. Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems.

AB - This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter,we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics.The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations. We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation.Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The vocal folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction. Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems.

KW - Navier-Stokes equations

KW - articulatory dynamics

KW - articulatory speech synthesis

KW - computational fluid dynamics

KW - human vocal apparatus

UR - http://www.scopus.com/inward/record.url?scp=84864557939&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864557939&partnerID=8YFLogxK

U2 - 10.2200/S00398ED1V01Y201112SAP009

DO - 10.2200/S00398ED1V01Y201112SAP009

M3 - Article

AN - SCOPUS:84864557939

VL - 9

SP - 1

EP - 118

JO - Synthesis Lectures on Speech and Audio Processing

JF - Synthesis Lectures on Speech and Audio Processing

SN - 1932-121X

ER -