A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Edward C. Lin, Kai Yu, Robin A Rutenbar, Tsuhan Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.

Original languageEnglish (US)
Title of host publicationFPGA 2007
Subtitle of host publicationFifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Pages60-68
Number of pages9
DOIs
StatePublished - Oct 2 2007
EventFPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - Monterey, CA, United States
Duration: Feb 18 2007Feb 20 2007

Publication series

NameACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA

Other

OtherFPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
CountryUnited States
CityMonterey, CA
Period2/18/072/20/07

Keywords

  • DSP
  • FPGA
  • In silico vox
  • Speech recognition

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA'. Together they form a unique fingerprint.

  • Cite this

    Lin, E. C., Yu, K., Rutenbar, R. A., & Chen, T. (2007). A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 60-68). [1216928] (ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA). https://doi.org/10.1145/1216919.1216928