Abstract
In this paper we introduce a new Markov model that is capable of recognizing speech from recordings of simultaneously speaking a priori known speakers. This work is based on recent work on non-negative representations of spectrograms, which has been shown to be very effective in source separation problems. In this paper we extend these approaches to design a Markov selection model that is able to recognize sequences even when they are presented mixed together. We do so without the need to perform separation on the signals. Unlike factorial Markov models which have been used similarly in the past that feature state spaces that are exponential in the number of sources, this approach features a low computational complexity model with a state space that is linear in the number of sources. We demonstrate the use of this framework in recognizing speech from mixtures of known speakers.
Original language | English (US) |
---|---|
Pages (from-to) | 64-72 |
Number of pages | 9 |
Journal | Neurocomputing |
Volume | 80 |
DOIs | |
State | Published - Mar 15 2012 |
Keywords
- Markov models
- Speech recognition
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence