TY - GEN
T1 - Landmark-based speech recognition
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
AU - Hasegawa-Johnson, Mark
AU - Baker, James
AU - Borys, Sarah
AU - Chen, Ken
AU - Coogan, Emily
AU - Greenberg, Steven
AU - Juneja, Amit
AU - Kirchhoff, Katrin
AU - Livescu, Karen
AU - Mohan, Srividya
AU - Muller, Jennifer
AU - Sonmez, Kemal
AU - Wang, Tianyu
PY - 2005
Y1 - 2005
N2 - Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multi-frame acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.
AB - Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multi-frame acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.
UR - http://www.scopus.com/inward/record.url?scp=27144481719&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=27144481719&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415088
DO - 10.1109/ICASSP.2005.1415088
M3 - Conference contribution
AN - SCOPUS:27144481719
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I213-I216
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 March 2005 through 23 March 2005
ER -