TY - GEN
T1 - A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA
AU - Lin, Edward C.
AU - Yu, Kai
AU - Rutenbar, Rob A.
AU - Chen, Tsuhan
PY - 2007
Y1 - 2007
N2 - The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.
AB - The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.
KW - DSP
KW - FPGA
KW - In silico vox
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=34748853756&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34748853756&partnerID=8YFLogxK
U2 - 10.1145/1216919.1216928
DO - 10.1145/1216919.1216928
M3 - Conference contribution
AN - SCOPUS:34748853756
SN - 1595936009
SN - 9781595936004
T3 - ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA
SP - 60
EP - 68
BT - FPGA 2007
T2 - FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Y2 - 18 February 2007 through 20 February 2007
ER -