In this paper we present and evaluate factored methods for recognition of simultaneous speech from multiple speakers in single-channel recordings. Factored methods decompose the problem of jointly recognizing the speech from each of the speakers by separately recognizing the speech from each speaker. In order to achieve this, the signal components of the target speaker in each case must be enhanced in some manner. We do this in two ways: using an NMF-based speaker separation algorithm that generates separated spectra for each speaker, and a mask estimation method that generates spectral masks for each speaker that must be used in conjunction with a missing-feature method that can recognize speech from partial spectral data. Experiments on synthetic mixtures of signals from the Wall Street Journal corpus show that both approaches can greatly improve the recognition of the individual signals in the mixture.
|Number of pages
|Published - 2005
|9th European Conference on Speech Communication and Technology - Lisbon, Portugal
Duration: Sep 4 2005 → Sep 8 2005
|9th European Conference on Speech Communication and Technology
|9/4/05 → 9/8/05
ASJC Scopus subject areas