A general algorithm for word graph matrix decomposition

Dilek Hakkani-Tür, Giuseppe Riccardi

Research output: Contribution to journalConference articlepeer-review

Abstract

In automatic speech recognition, word graphs (lattices) are commonly used as an approximate representation of the complete word search space. Usually these word lattices are acyclic and have no a-priori structure. More recently a new class of normalized word lattices have been proposed. These word lattices (a.k.a. sausages) are very efficient (space) and they provide a normalization (chunking) of the lattice, by aligning words from all possible hypotheses. In this paper we propose a general framework for lattice chunking, the pivot algorithm. There are four important components of the pivot algorithm. First, the time information is not necessary but is beneficial for the overall performance. Second, the algorithm allows the definition of a predefined chunk structure of the final word lattice. Third, the algorithm operates on both weighted and unweighted lattices. Fourth, the labels on the graph are generic, and could be words as well as part of speech tags or parse tags. While the algorithm has applications to many tasks (e.g. parsing, named entity extraction) we present results on the performance of confidence scores for different large vocabulary speech recognition tasks. We compare the results of our algorithms against off-the-shelf methods and show significant improvements.

Original languageEnglish (US)
Pages (from-to)596-599
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
StatePublished - 2003
Externally publishedYes
Event2003 IEEE International Conference on Accoustics, Speech, and Signal Processing - Hong Kong, Hong Kong
Duration: Apr 6 2003Apr 10 2003

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A general algorithm for word graph matrix decomposition'. Together they form a unique fingerprint.

Cite this