Glottal model based speech beamforming for Ad-hoc microphone arrays

Yang Zhang, Dinei Florèncio, Mark Allan Hasegawa-Johnson

Research output: Contribution to journalConference article

Abstract

We are interested in the task of speech beamforming in conference room meetings, with microphones built in the electronic devices brought and casually placed by meeting participants. This task is challenging because of the inaccuracy in position and interference calibration due to random microphone configuration, variance of microphone quality, reverberation etc. As a result, not many beamforming algorithms perform better than simply picking the closest microphone in this setting. We propose a beamforming called Glottal Residual Assisted Beamforming (GRAB). It does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulation and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry. Further analyses reveal that GRAB can distinguish contaminated or reverberant channels and take appropriate action accordingly.

Original languageEnglish (US)
Pages (from-to)2675-2679
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - Jan 1 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017

Fingerprint

Microphone Array
Beamforming
Microphones
Model-based
Calibration
Interference
Subjective Evaluation
Reverberation
Speech
Electronics
Filter
Minimise
Configuration
Energy
Model
Simulation

Keywords

  • Ad-Hoc Microphone Array
  • Beamforming
  • Lpc Residual
  • Speech Enhancement
  • Speech Model

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Glottal model based speech beamforming for Ad-hoc microphone arrays. / Zhang, Yang; Florèncio, Dinei; Hasegawa-Johnson, Mark Allan.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2017-August, 01.01.2017, p. 2675-2679.

Research output: Contribution to journalConference article

@article{bdba25168ced43e2af0f8b9b67a50916,
title = "Glottal model based speech beamforming for Ad-hoc microphone arrays",
abstract = "We are interested in the task of speech beamforming in conference room meetings, with microphones built in the electronic devices brought and casually placed by meeting participants. This task is challenging because of the inaccuracy in position and interference calibration due to random microphone configuration, variance of microphone quality, reverberation etc. As a result, not many beamforming algorithms perform better than simply picking the closest microphone in this setting. We propose a beamforming called Glottal Residual Assisted Beamforming (GRAB). It does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulation and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry. Further analyses reveal that GRAB can distinguish contaminated or reverberant channels and take appropriate action accordingly.",
keywords = "Ad-Hoc Microphone Array, Beamforming, Lpc Residual, Speech Enhancement, Speech Model",
author = "Yang Zhang and Dinei Flor{\`e}ncio and Hasegawa-Johnson, {Mark Allan}",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-1659",
language = "English (US)",
volume = "2017-August",
pages = "2675--2679",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Glottal model based speech beamforming for Ad-hoc microphone arrays

AU - Zhang, Yang

AU - Florèncio, Dinei

AU - Hasegawa-Johnson, Mark Allan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - We are interested in the task of speech beamforming in conference room meetings, with microphones built in the electronic devices brought and casually placed by meeting participants. This task is challenging because of the inaccuracy in position and interference calibration due to random microphone configuration, variance of microphone quality, reverberation etc. As a result, not many beamforming algorithms perform better than simply picking the closest microphone in this setting. We propose a beamforming called Glottal Residual Assisted Beamforming (GRAB). It does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulation and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry. Further analyses reveal that GRAB can distinguish contaminated or reverberant channels and take appropriate action accordingly.

AB - We are interested in the task of speech beamforming in conference room meetings, with microphones built in the electronic devices brought and casually placed by meeting participants. This task is challenging because of the inaccuracy in position and interference calibration due to random microphone configuration, variance of microphone quality, reverberation etc. As a result, not many beamforming algorithms perform better than simply picking the closest microphone in this setting. We propose a beamforming called Glottal Residual Assisted Beamforming (GRAB). It does not rely on any position or interference calibration. Instead, it incorporates a source-filter speech model and minimizes the energy that cannot be accounted for by the model. Objective and subjective evaluations on both simulation and real-world data show that GRAB is able to suppress noise effectively while keeping the speech natural and dry. Further analyses reveal that GRAB can distinguish contaminated or reverberant channels and take appropriate action accordingly.

KW - Ad-Hoc Microphone Array

KW - Beamforming

KW - Lpc Residual

KW - Speech Enhancement

KW - Speech Model

UR - http://www.scopus.com/inward/record.url?scp=85039158040&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039158040&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-1659

DO - 10.21437/Interspeech.2017-1659

M3 - Conference article

AN - SCOPUS:85039158040

VL - 2017-August

SP - 2675

EP - 2679

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -