FSM-based pronunciation modeling using articulatory phonological code

Chi Hu, Xiaodan Zhuang, Mark Allan Hasegawa-Johnson

Research output: Contribution to conferencePaper

Abstract

According to articulatory phonology, the gestural score is an invariant speech representation. Though the timing schemes, i.e., the onsets and offsets, of the gestural activations may vary, the ensemble of these activations tends to remain unchanged, informing the speech content. In this work, we propose a pronunciation modeling method that uses a finite state machine (FSM) to represent the invariance of a gestural score. Given the "canonical" gestural score (CGS) of a word with a known activation timing scheme, the plausible activation onsets and offsets are recursively generated and encoded as a weighted FSM. An empirical measure is used to prune out gestural activation timing schemes that deviate too much from the CGS. Speech recognition is achieved by matching the recovered gestural activations to the FSM-encoded gestural scores of different speech contents. We carry out pilot word classification experiments using synthesized data from one speaker. The proposed pronunciation modeling achieves over 90% accuracy for a vocabulary of 139 words with no training observations, outperforming direct use of the CGS.

Original languageEnglish (US)
Pages2274-2277
Number of pages4
StatePublished - Dec 1 2010
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: Sep 26 2010Sep 30 2010

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period9/26/109/30/10

Fingerprint

Vocabulary
Activation
Modeling
Onset
Experiment
Invariance
Articulatory Phonology
Speech Recognition
Informing
Ensemble

Keywords

  • Articulatory phonology
  • Finite state machine
  • Speech gesture
  • Speech production

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Cite this

Hu, C., Zhuang, X., & Hasegawa-Johnson, M. A. (2010). FSM-based pronunciation modeling using articulatory phonological code. 2274-2277. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.

FSM-based pronunciation modeling using articulatory phonological code. / Hu, Chi; Zhuang, Xiaodan; Hasegawa-Johnson, Mark Allan.

2010. 2274-2277 Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.

Research output: Contribution to conferencePaper

Hu, C, Zhuang, X & Hasegawa-Johnson, MA 2010, 'FSM-based pronunciation modeling using articulatory phonological code', Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan, 9/26/10 - 9/30/10 pp. 2274-2277.
Hu C, Zhuang X, Hasegawa-Johnson MA. FSM-based pronunciation modeling using articulatory phonological code. 2010. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.
Hu, Chi ; Zhuang, Xiaodan ; Hasegawa-Johnson, Mark Allan. / FSM-based pronunciation modeling using articulatory phonological code. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.4 p.
@conference{04f19e132f414a3488114d43dbc2303b,
title = "FSM-based pronunciation modeling using articulatory phonological code",
abstract = "According to articulatory phonology, the gestural score is an invariant speech representation. Though the timing schemes, i.e., the onsets and offsets, of the gestural activations may vary, the ensemble of these activations tends to remain unchanged, informing the speech content. In this work, we propose a pronunciation modeling method that uses a finite state machine (FSM) to represent the invariance of a gestural score. Given the {"}canonical{"} gestural score (CGS) of a word with a known activation timing scheme, the plausible activation onsets and offsets are recursively generated and encoded as a weighted FSM. An empirical measure is used to prune out gestural activation timing schemes that deviate too much from the CGS. Speech recognition is achieved by matching the recovered gestural activations to the FSM-encoded gestural scores of different speech contents. We carry out pilot word classification experiments using synthesized data from one speaker. The proposed pronunciation modeling achieves over 90{\%} accuracy for a vocabulary of 139 words with no training observations, outperforming direct use of the CGS.",
keywords = "Articulatory phonology, Finite state machine, Speech gesture, Speech production",
author = "Chi Hu and Xiaodan Zhuang and Hasegawa-Johnson, {Mark Allan}",
year = "2010",
month = "12",
day = "1",
language = "English (US)",
pages = "2274--2277",
note = "11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 ; Conference date: 26-09-2010 Through 30-09-2010",

}

TY - CONF

T1 - FSM-based pronunciation modeling using articulatory phonological code

AU - Hu, Chi

AU - Zhuang, Xiaodan

AU - Hasegawa-Johnson, Mark Allan

PY - 2010/12/1

Y1 - 2010/12/1

N2 - According to articulatory phonology, the gestural score is an invariant speech representation. Though the timing schemes, i.e., the onsets and offsets, of the gestural activations may vary, the ensemble of these activations tends to remain unchanged, informing the speech content. In this work, we propose a pronunciation modeling method that uses a finite state machine (FSM) to represent the invariance of a gestural score. Given the "canonical" gestural score (CGS) of a word with a known activation timing scheme, the plausible activation onsets and offsets are recursively generated and encoded as a weighted FSM. An empirical measure is used to prune out gestural activation timing schemes that deviate too much from the CGS. Speech recognition is achieved by matching the recovered gestural activations to the FSM-encoded gestural scores of different speech contents. We carry out pilot word classification experiments using synthesized data from one speaker. The proposed pronunciation modeling achieves over 90% accuracy for a vocabulary of 139 words with no training observations, outperforming direct use of the CGS.

AB - According to articulatory phonology, the gestural score is an invariant speech representation. Though the timing schemes, i.e., the onsets and offsets, of the gestural activations may vary, the ensemble of these activations tends to remain unchanged, informing the speech content. In this work, we propose a pronunciation modeling method that uses a finite state machine (FSM) to represent the invariance of a gestural score. Given the "canonical" gestural score (CGS) of a word with a known activation timing scheme, the plausible activation onsets and offsets are recursively generated and encoded as a weighted FSM. An empirical measure is used to prune out gestural activation timing schemes that deviate too much from the CGS. Speech recognition is achieved by matching the recovered gestural activations to the FSM-encoded gestural scores of different speech contents. We carry out pilot word classification experiments using synthesized data from one speaker. The proposed pronunciation modeling achieves over 90% accuracy for a vocabulary of 139 words with no training observations, outperforming direct use of the CGS.

KW - Articulatory phonology

KW - Finite state machine

KW - Speech gesture

KW - Speech production

UR - http://www.scopus.com/inward/record.url?scp=79959812754&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959812754&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:79959812754

SP - 2274

EP - 2277

ER -