Parallelizing WFST speech decoders

Charith Mendis, Jasha Droppo, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Geoffrey Zweig

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The performance-intensive part of a large-vocabulary continuous speech-recognition system is the Viterbi computation that determines the sequence of words that are most likely to generate the acoustic-state scores extracted from an input utterance. This paper presents an efficient parallel algorithm for Viterbi. The key idea is to partition the per-frame computation among threads to minimize inter-thread communication despite traversing a large irregular acoustic and language model graphs. Together with a per-thread beam search, load balancing language-model lookups, and memory optimizations, we achieve a 6.67x speedup over an highly-optimized production-quality WFST-based speech decoder. On a 200,000 word vocabulary and a 59 million ngram model, our decoder runs at 0.27x real time while achieving a word-error rate of 14.81% on 6214 labeled utterances from Voice Search data.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5325-5329
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period3/20/163/25/16

Keywords

  • Large vocabulary
  • Parallel Viterbi
  • WFST Decoder

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Parallelizing WFST speech decoders'. Together they form a unique fingerprint.

Cite this