Mining spoken dialogue corpora for system evaluation and modeling

Frederic Bechet, Giuseppe Riccardi, Dilek Hakkani-Tur

Research output: Contribution to conferencePaperpeer-review


We are interested in the problem of modeling and evaluating spoken language systems in the context of human-machine dialogs. Spoken dialog corpora allow for a multidimensional analysis of speech recognition and language understanding models of dialog systems. Therefore language models can be directly trained based either on the dialog history or its equivalence class (or cluster). In this paper we propose an algorithm to mine dialog traces which exhibit similar patterns and are identified by the same class. For this purpose we apply data clustering methods to large human-machine spoken dialogue corpora. The resulting clusters can be used for system evaluation and language modeling. By clustering dialog traces we expect to learn about the behavior of the system with regards to not only the automation rate but the nature of the interaction (e.g. easy vs difficult dialogs). The equivalence classes can also be used in order to automatically adapt the language model, the understanding module and the dialogue strategy to better fit the kind of interaction detected. This paper investigates different ways for encoding dialogues into multidimensional structures and different clustering methods. Preliminary results are given for cluster interpretation and dynamic model adaptation using the clusters obtained.

Original languageEnglish (US)
Number of pages8
StatePublished - 2004
Externally publishedYes
Event2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 - Barcelona, Spain
Duration: Jul 25 2004Jul 26 2004


Conference2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems


Dive into the research topics of 'Mining spoken dialogue corpora for system evaluation and modeling'. Together they form a unique fingerprint.

Cite this