Latent dirichlet decomposition for single channel speaker separation

Bhiksha Raj, Madhusudana V.S. Shashanka, Paris Smaragdis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The distribution of the process is modeled as a mixture of multinomial distributions, such that the mixture weights of the component multinomials vary from analysis window to analysis window. The component multinomials are assumed to be speaker specific and are learned from training signals for each speaker. We model the prior distribution of the mixture weights for each speaker as a Dirichlet distribution. The distributions representing magnitude spectral vectors for the mixed signal are decomposed into mixtures of the multinomials for all component speakers. The frequency distribution i.e the spectrum for each speaker is reconstructed from this decomposition.

Original languageEnglish (US)
Title of host publication2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
PagesV821-V824
StatePublished - 2006
Externally publishedYes
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse, France
Duration: May 14 2006May 19 2006

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume5
ISSN (Print)1520-6149

Other

Other2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Country/TerritoryFrance
CityToulouse
Period5/14/065/19/06

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Latent dirichlet decomposition for single channel speaker separation'. Together they form a unique fingerprint.

Cite this