A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

Edward C. Lin, Kai Yu, Robin A Rutenbar, Tsuhan Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.

Original languageEnglish (US)
Title of host publicationFPGA 2007
Subtitle of host publicationFifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Pages60-68
Number of pages9
DOIs
StatePublished - Oct 2 2007
EventFPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - Monterey, CA, United States
Duration: Feb 18 2007Feb 20 2007

Publication series

NameACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA

Other

OtherFPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
CountryUnited States
CityMonterey, CA
Period2/18/072/20/07

Fingerprint

Field programmable gate arrays (FPGA)
Hardware
Speech recognition
Application programs
Clocks
Engines

Keywords

  • DSP
  • FPGA
  • In silico vox
  • Speech recognition

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Lin, E. C., Yu, K., Rutenbar, R. A., & Chen, T. (2007). A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 60-68). [1216928] (ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA). https://doi.org/10.1145/1216919.1216928

A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. / Lin, Edward C.; Yu, Kai; Rutenbar, Robin A; Chen, Tsuhan.

FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2007. p. 60-68 1216928 (ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, EC, Yu, K, Rutenbar, RA & Chen, T 2007, A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. in FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays., 1216928, ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, pp. 60-68, FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, United States, 2/18/07. https://doi.org/10.1145/1216919.1216928
Lin EC, Yu K, Rutenbar RA, Chen T. A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. In FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2007. p. 60-68. 1216928. (ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA). https://doi.org/10.1145/1216919.1216928
Lin, Edward C. ; Yu, Kai ; Rutenbar, Robin A ; Chen, Tsuhan. / A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA. FPGA 2007: Fifteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2007. pp. 60-68 (ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA).
@inproceedings{b2d999bc319c45d59362df4155f632f6,
title = "A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA",
abstract = "The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a {"}live mode{"} engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.",
keywords = "DSP, FPGA, In silico vox, Speech recognition",
author = "Lin, {Edward C.} and Kai Yu and Rutenbar, {Robin A} and Tsuhan Chen",
year = "2007",
month = "10",
day = "2",
doi = "10.1145/1216919.1216928",
language = "English (US)",
isbn = "1595936009",
series = "ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA",
pages = "60--68",
booktitle = "FPGA 2007",

}

TY - GEN

T1 - A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA

AU - Lin, Edward C.

AU - Yu, Kai

AU - Rutenbar, Robin A

AU - Chen, Tsuhan

PY - 2007/10/2

Y1 - 2007/10/2

N2 - The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.

AB - The Carnegie Mellon In Silico Vox project seeks to move best-quality speech recognition technology from its current software-only form into a range of efficient all-hardware implementations. The central thesis is that, like graphics chips, the application is simply too performance hungry, and too power sensitive, to stay as a large software application. As a first step in this direction, we describe the design and implementation of a fully functional speech-to-text recognizer on a single Xilinx XUP platform. The design recognizes a 1000 word vocabulary, is speaker-independent, recognizes continuous (connected) speech, and is a "live mode" engine, wherein recognition can start as soon as speech input appears. To the best of our knowledge, this is the most complex recognizer architecture ever fully committed to a hardware-only form. The implementation is extraordinarily small, and achieves the same accuracy as state-of-the-art software recognizers, while running at a fraction of the clock speed.

KW - DSP

KW - FPGA

KW - In silico vox

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=34748853756&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34748853756&partnerID=8YFLogxK

U2 - 10.1145/1216919.1216928

DO - 10.1145/1216919.1216928

M3 - Conference contribution

AN - SCOPUS:34748853756

SN - 1595936009

SN - 9781595936004

T3 - ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA

SP - 60

EP - 68

BT - FPGA 2007

ER -