Abstract
We are interested in the problem of modeling and evaluating spoken language systems in the context of human-machine dialogs. Spoken dialog corpora allow for a multidimensional analysis of speech recognition and language understanding models of dialog systems. Therefore language models can be directly trained based either on the dialog history or its equivalence class (or cluster). In this paper we propose an algorithm to mine dialog traces which exhibit similar patterns and are identified by the same class. For this purpose we apply data clustering methods to large human-machine spoken dialogue corpora. The resulting clusters can be used for system evaluation and language modeling. By clustering dialog traces we expect to learn about the behavior of the system with regards to not only the automation rate but the nature of the interaction (e.g. easy vs difficult dialogs). The equivalence classes can also be used in order to automatically adapt the language model, the understanding module and the dialogue strategy to better fit the kind of interaction detected. This paper investigates different ways for encoding dialogues into multidimensional structures and different clustering methods. Preliminary results are given for cluster interpretation and dynamic model adaptation using the clusters obtained.
Original language | English (US) |
---|---|
Pages | 134-141 |
Number of pages | 8 |
State | Published - 2004 |
Externally published | Yes |
Event | 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 - Barcelona, Spain Duration: Jul 25 2004 → Jul 26 2004 |
Conference
Conference | 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 7/25/04 → 7/26/04 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems