Confidence Estimation For LLM-Based Dialogue State Tracking

Yi Jyun Sun, Suvodip Dey, Dilek Hakkani-Tur, Gokhan Tur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Estimation of a model's confidence on its outputs is critical for Conversational AI systems based on large language models (LLMs), especially for reducing hallucination and preventing over-reliance. In this work, we provide an exhaustive exploration of methods, including approaches proposed for open- and closed-weight LLMs, aimed at quantifying and leveraging model uncertainty to improve the reliability of LLM-generated responses, specifically focusing on dialogue state tracking (DST) in task-oriented dialogue systems (TODS). Regardless of the model type, well-calibrated confidence scores are essential to handle uncertainties, thereby improving model performance. We evaluate four methods for estimating confidence scores based on softmax, raw token scores, verbalized confidences, and a combination of these methods, using the area under the curve (AUC) metric to assess calibration, with higher AUC indicating better calibration. We also enhance these with a self-probing mechanism, proposed for closed models. Furthermore, we assess these methods using an open-weight model fine-tuned for the task of DST, achieving superior joint goal accuracy (JGA). Our findings also suggest that fine-tuning open-weight LLMs can result in enhanced AUC performance, indicating better confidence score calibration.

Original languageEnglish (US)
Title of host publicationProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1083-1090
Number of pages8
ISBN (Electronic)9798350392258
DOIs
StatePublished - 2024
Event2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China
Duration: Dec 2 2024Dec 5 2024

Publication series

NameProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/TerritoryChina
CityMacao
Period12/2/2412/5/24

Keywords

  • Task-oriented dialogue systems
  • confidence scores
  • dialogue state tracking
  • model uncertainty

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Confidence Estimation For LLM-Based Dialogue State Tracking'. Together they form a unique fingerprint.

Cite this