TY - JOUR
T1 - SPEAKER-INDEPENDENT, SYNTAX-DIRECTED, CONNECTED WORD RECOGNITION SYSTEM BASED ON HIDDEN MARKOV MODELS AND LEVEL BUILDING.
AU - Rabiner, Lawrence R.
AU - Levinson, Stephen E.
PY - 1985
Y1 - 1985
N2 - In the last several years, a wide variety of techniques have been developed which make practical the implementation and development of large networks for recognizing connected sequences of words. Included among these techniques are efficient and accurate speech modeling methods (e. g. , vector quantization, hidden Markov models) and efficient, optimal network search procedures (i. e. , level building). It is shown how to integrate these techniques to give a speaker-independent, syntax-directed, connected word recognition system which requires only a modest amount of computation and has a performance comparable to that of previous recognizers requiring an order of magnitude more computation. In particular, the recognizer studied was an airline information and reservation system using a 129-word vocabulary and a deterministic syntax (grammar) with 144 states, 450 state transitions, and 21 final states, generating more than 6 multiplied by 10**9 sentences. An evaluation of the system, using six talkers each speaking 51 test sentences, yielded a sentence accuracy of about 75% resulting from a word accuracy of about 93%, for an average speaking rate of about 210 words per minute.
AB - In the last several years, a wide variety of techniques have been developed which make practical the implementation and development of large networks for recognizing connected sequences of words. Included among these techniques are efficient and accurate speech modeling methods (e. g. , vector quantization, hidden Markov models) and efficient, optimal network search procedures (i. e. , level building). It is shown how to integrate these techniques to give a speaker-independent, syntax-directed, connected word recognition system which requires only a modest amount of computation and has a performance comparable to that of previous recognizers requiring an order of magnitude more computation. In particular, the recognizer studied was an airline information and reservation system using a 129-word vocabulary and a deterministic syntax (grammar) with 144 states, 450 state transitions, and 21 final states, generating more than 6 multiplied by 10**9 sentences. An evaluation of the system, using six talkers each speaking 51 test sentences, yielded a sentence accuracy of about 75% resulting from a word accuracy of about 93%, for an average speaking rate of about 210 words per minute.
UR - http://www.scopus.com/inward/record.url?scp=0022083778&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0022083778&partnerID=8YFLogxK
U2 - 10.1109/tassp.1985.1164586
DO - 10.1109/tassp.1985.1164586
M3 - Article
AN - SCOPUS:0022083778
SN - 1053-587X
VL - ASSP-33
SP - 561
EP - 573
JO - IRE Transactions on Audio
JF - IRE Transactions on Audio
IS - 3
ER -