The technique of dynamic time warping for time registration of a reference and test utterance has found widespread use in the areas of speaker verification and discrete word recognition. As originally proposed, the algorithm placed strong constraints on the possible set of dynamic paths-namely it was assumed that the initial and final frames of both the test and reference utterances were in exact time synchrony. Because of inherent practical difficulties with satisfying the assumptions under which the above constraints are valid, we have considered some modifications to the dynamic time warping algorithm. In particular, an algorithm in which an uncertainty exists in the registration both for initial and final frames was studied. Another modification constrains the dynamic path to follow (within a given range) the path which is locally optimum at each frame. This modification tends to work well when the location of the final frame of the test utterance is significantly in error due to breath noise, etc. To test the different time warping algorithms a set of ten isolated words spoken by 100 speakers was used. Probability density functions of the distances from each of the 100 versions of a word to a reference version of the word were estimated for each of three dynamic warping algorithms. From these data, it is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time warping algorithm with unconstrained endpoints. A discussion of this key result along with some ideas on where the other modifications would be most useful is included.
|Original language||English (US)|
|Number of pages||8|
|Journal||IEEE Transactions on Acoustics, Speech, and Signal Processing|
|State||Published - Dec 1978|
ASJC Scopus subject areas
- Signal Processing