Formant trajectories for acoustic-to-articulatory inversion

I. Yücel Özbek, Mark Allan Hasegawa-Johnson, Mübeccel Demirekler

Research output: Contribution to journalConference article

Abstract

This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepstral (MFCC) observations is modified by using formants and energies to either replace or augment the MFCC observation vector. The augmented observation results in 3.4% lower RMS error, and 2% higher correlation coefficient, than the baseline MFCC observation. Improvement is especially good for stop consonants, possibly because formant tracking provides information about the acoustic resonances that would be otherwise unavailable during stop closure and release.

Original languageEnglish (US)
Pages (from-to)2807-2810
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - Nov 26 2009
Event10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom
Duration: Sep 6 2009Sep 10 2009

Fingerprint

Acoustics
Trajectories
Observation
Electromagnetic Phenomena

Keywords

  • Acoustic-to-articulatory inversion
  • Formant tracking
  • GMM regression

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Cite this

Formant trajectories for acoustic-to-articulatory inversion. / Özbek, I. Yücel; Hasegawa-Johnson, Mark Allan; Demirekler, Mübeccel.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 26.11.2009, p. 2807-2810.

Research output: Contribution to journalConference article

@article{ad039b6a409d46c296eb5244891c1ebb,
title = "Formant trajectories for acoustic-to-articulatory inversion",
abstract = "This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepstral (MFCC) observations is modified by using formants and energies to either replace or augment the MFCC observation vector. The augmented observation results in 3.4{\%} lower RMS error, and 2{\%} higher correlation coefficient, than the baseline MFCC observation. Improvement is especially good for stop consonants, possibly because formant tracking provides information about the acoustic resonances that would be otherwise unavailable during stop closure and release.",
keywords = "Acoustic-to-articulatory inversion, Formant tracking, GMM regression",
author = "{\"O}zbek, {I. Y{\"u}cel} and Hasegawa-Johnson, {Mark Allan} and M{\"u}beccel Demirekler",
year = "2009",
month = "11",
day = "26",
language = "English (US)",
pages = "2807--2810",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Formant trajectories for acoustic-to-articulatory inversion

AU - Özbek, I. Yücel

AU - Hasegawa-Johnson, Mark Allan

AU - Demirekler, Mübeccel

PY - 2009/11/26

Y1 - 2009/11/26

N2 - This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepstral (MFCC) observations is modified by using formants and energies to either replace or augment the MFCC observation vector. The augmented observation results in 3.4% lower RMS error, and 2% higher correlation coefficient, than the baseline MFCC observation. Improvement is especially good for stop consonants, possibly because formant tracking provides information about the acoustic resonances that would be otherwise unavailable during stop closure and release.

AB - This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepstral (MFCC) observations is modified by using formants and energies to either replace or augment the MFCC observation vector. The augmented observation results in 3.4% lower RMS error, and 2% higher correlation coefficient, than the baseline MFCC observation. Improvement is especially good for stop consonants, possibly because formant tracking provides information about the acoustic resonances that would be otherwise unavailable during stop closure and release.

KW - Acoustic-to-articulatory inversion

KW - Formant tracking

KW - GMM regression

UR - http://www.scopus.com/inward/record.url?scp=70450216609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70450216609&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:70450216609

SP - 2807

EP - 2810

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -