High/Low Model for Scalable Multimicrophone Enhancement of Speech Mixtures

Ryan M. Corey, Andrew C. Singer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many speech separation and enhancement methods take advantage of time-frequency sparsity by assuming that only one speech source in a mixture has nonzero power at each time and frequency. This “on/off” model is valuable for systems with more sources than microphones, but many methods that use it do not benefit from the spatial diversity of systems with large numbers of microphones. This work considers the high/low model, in which one source is strongest at each time-frequency index but all sources have nonzero power. A time-varying enhancement method using the high/low model combines the benefits of sparsity and spatial diversity and scales automatically with the number of microphones, resembling a time-frequency mask for underdetermined systems and a linear filter for overdetermined systems. The model is demonstrated using real-room data with up to 10 speech signals and between 1 and 160 microphones.

Original languageEnglish (US)
Title of host publication29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages880-884
Number of pages5
ISBN (Electronic)9789082797060
DOIs
StatePublished - 2021
Event29th European Signal Processing Conference, EUSIPCO 2021 - Dublin, Ireland
Duration: Aug 23 2021Aug 27 2021

Publication series

NameEuropean Signal Processing Conference
Volume2021-August
ISSN (Print)2219-5491

Conference

Conference29th European Signal Processing Conference, EUSIPCO 2021
Country/TerritoryIreland
CityDublin
Period8/23/218/27/21

Keywords

  • Microphone arrays
  • Source separation
  • Speech enhancement

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering

Cite this