Large Language Models as User-Agents For Evaluating Task-Oriented-Dialogue Systems

Taaha Kazi, Ruiliang Lyu, Sizhe Zhou, Dilek Hakkani-Tur, Gokhan Tur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditionally, offline datasets have been used to evaluate task-oriented dialogue (TOD) models. These datasets lack context awareness, making them suboptimal benchmarks for conversational systems. In contrast, user-agents, which are context-aware, can simulate the variability and unpredictability of human conversations, making them better alternatives as evaluators. Prior research has utilized large language models (LLMs) to develop user-agents. Our work builds upon this by using LLMs to create user-agents for the evaluation of TOD systems. This involves prompting an LLM, using in-context examples as guidance, and tracking the user-goal state. Our evaluation of diversity and task completion metrics for the user-agents shows improved performance with the use of better prompts. Additionally, we propose methodologies for the automatic evaluation of TOD models within this dynamic framework. We make our code publicly available1,.

Original languageEnglish (US)
Title of host publicationProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages913-920
Number of pages8
ISBN (Electronic)9798350392258
DOIs
StatePublished - 2024
Event2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China
Duration: Dec 2 2024Dec 5 2024

Publication series

NameProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/TerritoryChina
CityMacao
Period12/2/2412/5/24

Keywords

  • Task-oriented dialogue systems
  • large language models
  • task completion
  • user simulation agents

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Large Language Models as User-Agents For Evaluating Task-Oriented-Dialogue Systems'. Together they form a unique fingerprint.

Cite this