TY - GEN
T1 - Novel time domain multi-class SVMs for landmark detection
AU - Chitturi, Rahul
AU - Johnson, Mark Hasegawa
N1 - Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2006
Y1 - 2006
N2 - The training of precise speech recognition models depends on accurate segmentation of the phonemes in a training corpus. Segmentation is typically performed using HMMs, but recent speech recognition work suggests that the transient acoustic features characteristic of manner-class phoneme boundaries (landmarks) may be more precisely localized using acoustic classifiers specifically designed for the task of landmark detection. This paper makes an empirical exploration of new features which suit Landmark Detection and the application of Multi-class SVMs that are capable of improving the time alignment of phoneme boundaries proposed by Binary SVMs and HMM-based speech recognizer. On a standard benchmark data set (A database of Telugu - Official Indian Language, spoken by 75 million people), we achieve a new state-of-the-art performance, reducing RMS phone boundary alignment error from 32ms to 22ms.
AB - The training of precise speech recognition models depends on accurate segmentation of the phonemes in a training corpus. Segmentation is typically performed using HMMs, but recent speech recognition work suggests that the transient acoustic features characteristic of manner-class phoneme boundaries (landmarks) may be more precisely localized using acoustic classifiers specifically designed for the task of landmark detection. This paper makes an empirical exploration of new features which suit Landmark Detection and the application of Multi-class SVMs that are capable of improving the time alignment of phoneme boundaries proposed by Binary SVMs and HMM-based speech recognizer. On a standard benchmark data set (A database of Telugu - Official Indian Language, spoken by 75 million people), we achieve a new state-of-the-art performance, reducing RMS phone boundary alignment error from 32ms to 22ms.
KW - Landmark
KW - Multi class SVM
KW - Segmentation
KW - Time domain flatness measure
UR - http://www.scopus.com/inward/record.url?scp=44949264043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=44949264043&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:44949264043
SN - 9781604234497
T3 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
SP - 2354
EP - 2357
BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PB - International Speech Communication Association
T2 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Y2 - 17 September 2006 through 21 September 2006
ER -