TY - GEN
T1 - Latent variable decomposition of spectrograms for single channel speaker separation
AU - Raj, Bhiksha
AU - Smaragdis, Paris
PY - 2005
Y1 - 2005
N2 - In this paper we present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modelled a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learnt from training signals for each speaker. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution, i.e. the spectrum for each speaker is reconstructed from this decomposition. Experimental results show that the proposed method is very effective at separating mixed signals.
AB - In this paper we present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modelled a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learnt from training signals for each speaker. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution, i.e. the spectrum for each speaker is reconstructed from this decomposition. Experimental results show that the proposed method is very effective at separating mixed signals.
UR - http://www.scopus.com/inward/record.url?scp=33749064773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749064773&partnerID=8YFLogxK
U2 - 10.1109/ASPAA.2005.1540157
DO - 10.1109/ASPAA.2005.1540157
M3 - Conference contribution
AN - SCOPUS:33749064773
SN - 0780391543
SN - 9780780391543
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 17
EP - 20
BT - 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
T2 - 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Y2 - 16 October 2005 through 19 October 2005
ER -