Learning when to listen: Detecting system-addressed speech in human-human-computer dialog

Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry Heck

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

New challenges arise for addressee detection when multiple people interact jointly with a spoken dialog system using unconstrained natural language. We study the problem of discriminating computer-directed from human-directed speech in a new corpus of human-human-computer (H-H-C) dialog, using lexical and prosodic features. The prosodic features use no word, context, or speaker information. Results with 19% WER speech recognition show improvements from lexical features (EER=23.1%) to prosodic features (EER=12.6%) to a combined model (EER=11.1%). Prosodic features also provide a 35% error reduction over a lexical model using true words (EER from 10.2% to 6.7%). Modeling energy contours with GMMs provides a particularly good prosodic model. While lexical models perform well for commands, they confuse free-form system-directed speech with human-human speech. Prosodic models dramatically reduce these confusions, implying that users change speaking style as they shift addressees (computer versus human) within a session. Overall results provide strong support for combining simple acoustic-prosodic models with lexical models to detect speaking style differences for this task.

Original languageEnglish (US)
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
PublisherInternational Speech Communication Association
Pages334-337
Number of pages4
ISBN (Print)9781622767595
DOIs
StatePublished - 2012
Externally publishedYes
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: Sep 9 2012Sep 13 2012

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume1

Other

Other13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/TerritoryUnited States
CityPortland, OR
Period9/9/129/13/12

Keywords

  • Addressee detection
  • Boosting
  • GMM
  • Language model
  • Logistic regression
  • Prosody
  • Spoken dialog system

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Fingerprint

Dive into the research topics of 'Learning when to listen: Detecting system-addressed speech in human-human-computer dialog'. Together they form a unique fingerprint.

Cite this