Eye gaze for spoken language understanding in multi-modal conversational interactions

Dilek Hakkani-Tür, Malcolm Slaney, Asli Celikyilmaz, Larry Heck

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a model that resolves references to visual (screen) elements in a conversational web browsing system. The system detects eye gaze, recognizes speech, and then interprets the user's browsing intent (e.g., click on a specific element) through a combination of spoken language understanding and eye gaze tracking. We experiment with multi-turn interactions collected in a wizard-of-Oz scenario where users are asked to perform several web-browsing tasks. We compare several gaze features and evaluate their effectiveness when combined with speech-based lexical features. The resulting multi-modal system not only increases user intent (turn) accuracy by 17%, but also resolves the referring expression ambiguity commonly observed in dialog systems with a 10% increase in F-measure.

Original languageEnglish (US)
Title of host publicationICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery
Pages263-266
Number of pages4
ISBN (Electronic)9781450328852
DOIs
StatePublished - Nov 12 2014
Externally publishedYes
Event16th ACM International Conference on Multimodal Interaction, ICMI 2014 - Istanbul, Turkey
Duration: Nov 12 2014Nov 16 2014

Publication series

NameICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction

Conference

Conference16th ACM International Conference on Multimodal Interaction, ICMI 2014
Country/TerritoryTurkey
CityIstanbul
Period11/12/1411/16/14

Keywords

  • Eye gaze
  • Reference resolution
  • Spoken language understanding

ASJC Scopus subject areas

  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Eye gaze for spoken language understanding in multi-modal conversational interactions'. Together they form a unique fingerprint.

Cite this