Multivariate-state hidden Markov models for simultaneous transcription of phones and formants

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A multivariate-state HMM-an HMM with a vector state variable-can be used to find jointly optimal phonetic and formant transcriptions of an utterance. The complexity of searching a multivariate state space using the Baum-Welch algorithm is substantial, but may be significantly reduced if the formant frequencies are assumed to be conditionally independent given knowledge of the phone. Operating with a known phonetic transcription, the multivariate-state model can provide a maximum a posteriori formant trajectory, complete with confidence limits on each of the formant frequency measurements. The model can also be used as a phonetic classifier by adding the probabilities of all possible formant trajectories. A test system is described which requires only nine trainable parameters per formant per phonetic state: five parameters to model formant transitions, and four to model spectral observations. Further simplifications were achieved through parameter tying.

Original languageEnglish (US)
Title of host publicationSpeech Processing II
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1323-1326
Number of pages4
ISBN (Electronic)0780362934
DOIs
StatePublished - 2000
Event25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000 - Istanbul, Turkey
Duration: Jun 5 2000Jun 9 2000

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume3
ISSN (Print)1520-6149

Other

Other25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
Country/TerritoryTurkey
CityIstanbul
Period6/5/006/9/00

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multivariate-state hidden Markov models for simultaneous transcription of phones and formants'. Together they form a unique fingerprint.

Cite this