Joint decoding for speech recognition and semantic tagging

Anoop Deoras, Ruhi Sarikaya, Gokhan Tur, Dilek Hakkani-Tür

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most conversational understanding (CU) systems today employ a cascade approach, where the best hypothesis from automatic speech recognizer (ASR) is fed into spoken language under-standing (SLU) module, whose best hypothesis is then fed into other systems such as interpreter or dialog manager. In such ap-proaches, errors from one statistical module irreversibly propa-gates into another module causing a serious degradation in the overall performance of the conversational understanding sys-tem. Thus it is desirable to jointly optimize all the statistical modules together. As a first step towards this, in this paper, we propose a joint decoding framework in which we predict the op-timal word as well as slot (semantic tag) sequence jointly given the input acoustic stream. On Microsoft's CU system, we show 1.3% absolute reduction in word error rate (WER) and 1.2% absolute improvement in F measure for slot prediction when compared to a very strong cascade baseline comprising of the state-of-the-art recognizer followed by a slot sequence tagger.

Original languageEnglish (US)
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages1066-1069
Number of pages4
StatePublished - 2012
Externally publishedYes
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: Sep 9 2012Sep 13 2012

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume2

Other

Other13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/TerritoryUnited States
CityPortland, OR
Period9/9/129/13/12

Keywords

  • ASR
  • CRF
  • CU
  • ME
  • SLU

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Fingerprint

Dive into the research topics of 'Joint decoding for speech recognition and semantic tagging'. Together they form a unique fingerprint.

Cite this