Active labeling for spoken language understanding

Gokhan Tur, Mazin Rahim, Dilek Hakkani-Tür

Research output: Contribution to conferencePaperpeer-review

Abstract

State-of-the-art spoken language understanding (SLU) systems are trained using human-labeled utterances, preparation of which is labor intensive and time consuming. Labeling is an error-prone process due to various reasons, such as labeler errors or imperfect description of classes. Thus, usually a second (or maybe more) pass(es) of labeling is required in order to check and fix the labeling errors and inconsistencies of the first (or earlier) pass(es). In this paper, we check the effect of labeling errors for statistical call classification and evaluate methods of finding and correcting these errors by checking minimum amount of data. We describe two alternative methods to speed up the labeling effort, one is based on the confidences obtained from a prior model and the other completely unsupervised. We call the labeling process employing one of these methods as active labeling. Active labeling aims to minimize the number of utterances to be checked again by automatically selecting the ones that are likely to be erroneous or inconsistent with the previously labeled examples. Although very same methods can be used as a postprocessing step to correct labeling errors, we only consider them as part of the labeling process. We have evaluated these active labelingmethods using a call classification system used for AT&T natural dialog customer care system. Our results indicate that it is possible to find about 90% of the labeling errors or inconsistencies by checking just half the data.

Original languageEnglish (US)
Pages2789-2792
Number of pages4
StatePublished - 2003
Externally publishedYes
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: Sep 1 2003Sep 4 2003

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
Country/TerritorySwitzerland
CityGeneva
Period9/1/039/4/03

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'Active labeling for spoken language understanding'. Together they form a unique fingerprint.

Cite this