Combining classifiers for spoken language understanding

Mercan Karahan, Dilek Hakkani-Tür, Giuseppe Riccardi, Gokhan Tur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We are interested in the problem of understanding spontaneous speech in the context of human-machine dialogs. Utterance classification is a key component of the understanding process to determine the intent of the user. This paper presents methods for combining different statistical classifiers for spoken language understanding. We propose three combination methods. The first one combines the scores assigned to the call-types by individual classifiers using a voting mechanism. The second method is a cascaded approach. The third method employs a top level learner to decide on the final call-type. We have evaluated these combination methods over three large spoken dialog databases collected (∼106 dialogs) using the AT&T natural spoken dialog system for customer care applications. The results indicate that it is possible to significantly reduce the error rate of the understanding module using these combination methods.

Original languageEnglish (US)
Title of host publication2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages589-594
Number of pages6
ISBN (Electronic)0780379802, 9780780379800
DOIs
StatePublished - 2003
Externally publishedYes
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 - St. Thomas, United States
Duration: Nov 30 2003Dec 4 2003

Publication series

Name2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
Country/TerritoryUnited States
CitySt. Thomas
Period11/30/0312/4/03

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Combining classifiers for spoken language understanding'. Together they form a unique fingerprint.

Cite this