TY - GEN
T1 - Latent dirichlet decomposition for single channel speaker separation
AU - Raj, Bhiksha
AU - Shashanka, Madhusudana V.S.
AU - Smaragdis, Paris
PY - 2006
Y1 - 2006
N2 - We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modeled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learned from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a Dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution i.e the spectrum for each speaker is reconstructed from this decomposition.
AB - We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modeled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learned from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a Dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution i.e the spectrum for each speaker is reconstructed from this decomposition.
UR - http://www.scopus.com/inward/record.url?scp=33947654932&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947654932&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33947654932
SN - 142440469X
SN - 9781424404698
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - V821-V824
BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Y2 - 14 May 2006 through 19 May 2006
ER -