Acoustic segmentation using switching state Kalman filter

Research output: Contribution to journalConference article

Abstract

Segmenting the acoustic signal in the TIMIT database by a switching state Kalman filter model is reported in this paper. According to the assumption that the high dimensional acoustic feature vector of the LSF (Line Spectrum Frequency) of the speech signal is probably embedded in a low dimensional space, a two dimensional vector is used to represent the continuous state vector in this model. The parameters of the model are initialized by PPCA (probabilistic principle component analysis) and first order vector auto-regression, and are re-estimated by the EM algorithm. We show that this model can be used to classify vowels, nasals, frication and silence by an approximate Viterbi inference.

Original languageEnglish (US)
Pages (from-to)752-755
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
StatePublished - Sep 25 2003
Event2003 IEEE International Conference on Accoustics, Speech, and Signal Processing - Hong Kong, Hong Kong
Duration: Apr 6 2003Apr 10 2003

Fingerprint

Kalman filters
Acoustics

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{f2e83fd15fc84a26aa2a9314c529a212,
title = "Acoustic segmentation using switching state Kalman filter",
abstract = "Segmenting the acoustic signal in the TIMIT database by a switching state Kalman filter model is reported in this paper. According to the assumption that the high dimensional acoustic feature vector of the LSF (Line Spectrum Frequency) of the speech signal is probably embedded in a low dimensional space, a two dimensional vector is used to represent the continuous state vector in this model. The parameters of the model are initialized by PPCA (probabilistic principle component analysis) and first order vector auto-regression, and are re-estimated by the EM algorithm. We show that this model can be used to classify vowels, nasals, frication and silence by an approximate Viterbi inference.",
author = "Yanli Zheng and Hasegawa-Johnson, {Mark Allan}",
year = "2003",
month = "9",
day = "25",
language = "English (US)",
volume = "1",
pages = "752--755",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Acoustic segmentation using switching state Kalman filter

AU - Zheng, Yanli

AU - Hasegawa-Johnson, Mark Allan

PY - 2003/9/25

Y1 - 2003/9/25

N2 - Segmenting the acoustic signal in the TIMIT database by a switching state Kalman filter model is reported in this paper. According to the assumption that the high dimensional acoustic feature vector of the LSF (Line Spectrum Frequency) of the speech signal is probably embedded in a low dimensional space, a two dimensional vector is used to represent the continuous state vector in this model. The parameters of the model are initialized by PPCA (probabilistic principle component analysis) and first order vector auto-regression, and are re-estimated by the EM algorithm. We show that this model can be used to classify vowels, nasals, frication and silence by an approximate Viterbi inference.

AB - Segmenting the acoustic signal in the TIMIT database by a switching state Kalman filter model is reported in this paper. According to the assumption that the high dimensional acoustic feature vector of the LSF (Line Spectrum Frequency) of the speech signal is probably embedded in a low dimensional space, a two dimensional vector is used to represent the continuous state vector in this model. The parameters of the model are initialized by PPCA (probabilistic principle component analysis) and first order vector auto-regression, and are re-estimated by the EM algorithm. We show that this model can be used to classify vowels, nasals, frication and silence by an approximate Viterbi inference.

UR - http://www.scopus.com/inward/record.url?scp=0141813641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0141813641&partnerID=8YFLogxK

M3 - Conference article

VL - 1

SP - 752

EP - 755

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -