TY - GEN
T1 - Multivariate-state hidden Markov models for simultaneous transcription of phones and formants
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
© 2000 IEEE.
PY - 2000
Y1 - 2000
N2 - A multivariate-state HMM-an HMM with a vector state variable-can be used to find jointly optimal phonetic and formant transcriptions of an utterance. The complexity of searching a multivariate state space using the Baum-Welch algorithm is substantial, but may be significantly reduced if the formant frequencies are assumed to be conditionally independent given knowledge of the phone. Operating with a known phonetic transcription, the multivariate-state model can provide a maximum a posteriori formant trajectory, complete with confidence limits on each of the formant frequency measurements. The model can also be used as a phonetic classifier by adding the probabilities of all possible formant trajectories. A test system is described which requires only nine trainable parameters per formant per phonetic state: five parameters to model formant transitions, and four to model spectral observations. Further simplifications were achieved through parameter tying.
AB - A multivariate-state HMM-an HMM with a vector state variable-can be used to find jointly optimal phonetic and formant transcriptions of an utterance. The complexity of searching a multivariate state space using the Baum-Welch algorithm is substantial, but may be significantly reduced if the formant frequencies are assumed to be conditionally independent given knowledge of the phone. Operating with a known phonetic transcription, the multivariate-state model can provide a maximum a posteriori formant trajectory, complete with confidence limits on each of the formant frequency measurements. The model can also be used as a phonetic classifier by adding the probabilities of all possible formant trajectories. A test system is described which requires only nine trainable parameters per formant per phonetic state: five parameters to model formant transitions, and four to model spectral observations. Further simplifications were achieved through parameter tying.
UR - http://www.scopus.com/inward/record.url?scp=0033692965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033692965&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2000.861822
DO - 10.1109/ICASSP.2000.861822
M3 - Conference contribution
AN - SCOPUS:0033692965
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1323
EP - 1326
BT - Speech Processing II
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000
Y2 - 5 June 2000 through 9 June 2000
ER -