Sparse mixture of local experts for efficient speech enhancement

Aswin Sivaraman, Minje Kim

Research output: Contribution to journalConference articlepeer-review


This work proposes a novel approach for reducing the computational complexity of speech denoising neural networks by using a sparsely active ensemble topology. In our ensemble networks, a gating module classifies an input noisy speech signal either by identifying speaker gender or by estimating signal degradation, and exclusively assigns it to a best-case specialist module, optimized to denoise a particular subset of the training data. This approach extends the hypothesis that speech denoising can be simplified if it is split into non-overlapping subproblems, contrasting earlier approaches that train large generalist neural networks to address a wide range of noisy speech data. We compare a baseline recurrent network against an ensemble of similarly designed, but smaller networks. Each network module is trained independently and combined to form a naïve ensemble. This can be further fine-tuned using a sparsity parameter to improve performance. Our experiments on noisy speech data-generated by mixing LibriSpeech and MUSAN datasets-demonstrate that a fine-tuned sparsely active ensemble can outperform a generalist using significantly fewer calculations. The key insight of this paper, leveraging model selection as a form of network compression, may be used to supplement already-existing deep learning methods for speech denoising.

Original languageEnglish (US)
Pages (from-to)4526-4530
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: Oct 25 2020Oct 29 2020


  • Adaptive mixture of local experts
  • Model selection
  • Neural network compression
  • Speech enhancement

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation


Dive into the research topics of 'Sparse mixture of local experts for efficient speech enhancement'. Together they form a unique fingerprint.

Cite this