Extraction of pragmatic and semantic salience from spontaneous spoken English

Research output: Contribution to journalArticle

Abstract

This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used speech waveforms together with manual orthographic transcriptions, and yielded an accuracy of 83.8% for focus kernel detection and 92.8% for symmetric contrast detection. Our tests also demonstrated that the spectral balance cepstral coefficients, the semantic dissimilarity measure, and part-of-speech played important roles in the symmetric contrast and focus kernel detections.

Original languageEnglish (US)
Pages (from-to)437-462
Number of pages26
JournalSpeech Communication
Volume48
Issue number3-4
DOIs
StatePublished - Mar 1 2006

Fingerprint

pragmatics
Semantics
semantics
kernel
Dissimilarity Measure
Linguistics
Tagging
Transcription
linguistics
Computational linguistics
Computational Linguistics
computational linguistics
Dialogue Systems
Dissimilarity
Coefficient
Waveform
Demonstrate
Replication
Speech
Kernel

Keywords

  • Computational linguistics
  • Information extraction
  • Spoken dialogue systems
  • Spoken language understanding

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Extraction of pragmatic and semantic salience from spontaneous spoken English. / Zhang, Tong; Hasegawa-Johnson, Mark Allan; Levinson, Stephen E.

In: Speech Communication, Vol. 48, No. 3-4, 01.03.2006, p. 437-462.

Research output: Contribution to journalArticle

@article{ee032254704849f199d65e276487e1e1,
title = "Extraction of pragmatic and semantic salience from spontaneous spoken English",
abstract = "This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used speech waveforms together with manual orthographic transcriptions, and yielded an accuracy of 83.8{\%} for focus kernel detection and 92.8{\%} for symmetric contrast detection. Our tests also demonstrated that the spectral balance cepstral coefficients, the semantic dissimilarity measure, and part-of-speech played important roles in the symmetric contrast and focus kernel detections.",
keywords = "Computational linguistics, Information extraction, Spoken dialogue systems, Spoken language understanding",
author = "Tong Zhang and Hasegawa-Johnson, {Mark Allan} and Levinson, {Stephen E}",
year = "2006",
month = "3",
day = "1",
doi = "10.1016/j.specom.2005.07.007",
language = "English (US)",
volume = "48",
pages = "437--462",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "3-4",

}

TY - JOUR

T1 - Extraction of pragmatic and semantic salience from spontaneous spoken English

AU - Zhang, Tong

AU - Hasegawa-Johnson, Mark Allan

AU - Levinson, Stephen E

PY - 2006/3/1

Y1 - 2006/3/1

N2 - This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used speech waveforms together with manual orthographic transcriptions, and yielded an accuracy of 83.8% for focus kernel detection and 92.8% for symmetric contrast detection. Our tests also demonstrated that the spectral balance cepstral coefficients, the semantic dissimilarity measure, and part-of-speech played important roles in the symmetric contrast and focus kernel detections.

AB - This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used speech waveforms together with manual orthographic transcriptions, and yielded an accuracy of 83.8% for focus kernel detection and 92.8% for symmetric contrast detection. Our tests also demonstrated that the spectral balance cepstral coefficients, the semantic dissimilarity measure, and part-of-speech played important roles in the symmetric contrast and focus kernel detections.

KW - Computational linguistics

KW - Information extraction

KW - Spoken dialogue systems

KW - Spoken language understanding

UR - http://www.scopus.com/inward/record.url?scp=32144453060&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32144453060&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2005.07.007

DO - 10.1016/j.specom.2005.07.007

M3 - Article

AN - SCOPUS:32144453060

VL - 48

SP - 437

EP - 462

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 3-4

ER -