On Temporal Alignment of Sentences of Natural and Synthetic Speech

Hans D. Hühne, Cecil Coker, Stephen E. Levinson, Lawrence R. Rabiner

Research output: Contribution to journalArticlepeer-review


One way to improve the quality of synthetic speech, and to learn about temporal aspects of speech recognition, is to study the problem of time aligning pairs of spoken sentences. For example, one could evaluate various sets of duration rules for synthesis by comparing the time alignments of speech sounds within synthetic sentences to those of naturally spoken sentences. In this manner, an improved set of sound duration rules could be obtained by applying some objective measure to the alignment scores. For speech recognition applications, one could obtain automatic labeling of continuous speech from a hand-marked prototype to obtain models and/or statistical data on sounds within sentences. A key question in the use of automatic alignment of sentence length utterances is whether the time warping methods, developed for isolated word recognition, could be extended to the problem of time aligning sentence length utterances (up to several seconds long). A second key question is the reliability and accuracy of such an alignment. In this paper we investigate these questions. It is shown that, with some simple modifications, the dynamic time warping procedures used for isolated word recognition apply almost as well to alignment of sentence length utterances. It is also shown that, on the average, the uncertainty in the location of significant events within the sentence is much smaller than the event durations although the largest errors are longer than some event durations. Hence, one must apply caution in using the time alignment contour for synthesis or recognition applications.

Original languageEnglish (US)
Pages (from-to)807-813
Number of pages7
JournalIEEE Transactions on Acoustics, Speech, and Signal Processing
Issue number4
StatePublished - Aug 1983
Externally publishedYes

ASJC Scopus subject areas

  • Signal Processing


Dive into the research topics of 'On Temporal Alignment of Sentences of Natural and Synthetic Speech'. Together they form a unique fingerprint.

Cite this