Speech enhancement using bayesianwavenet

Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florèncio, Mark Allan Hasegawa-Johnson

Research output: Contribution to journalConference article

Abstract

In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. The clean speech is then reconstructed using the approach of overlap-Add, which is limited by its inherent performance upper bound. This paper presents a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech.

Original languageEnglish (US)
Pages (from-to)2013-2017
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - Jan 1 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017

Fingerprint

Speech Enhancement
Speech enhancement
Spectrogram
Conditional Distribution
Prior distribution
Speech
Enhancement
Frequency Domain
Overlap
Upper bound
Output
Modeling

Keywords

  • Bayesian Framework
  • Convolutional Neural Network
  • Model-Based
  • Speech Enhancement
  • Wavenet

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Speech enhancement using bayesianwavenet. / Qian, Kaizhi; Zhang, Yang; Chang, Shiyu; Yang, Xuesong; Florèncio, Dinei; Hasegawa-Johnson, Mark Allan.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2017-August, 01.01.2017, p. 2013-2017.

Research output: Contribution to journalConference article

Qian, Kaizhi ; Zhang, Yang ; Chang, Shiyu ; Yang, Xuesong ; Florèncio, Dinei ; Hasegawa-Johnson, Mark Allan. / Speech enhancement using bayesianwavenet. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2017 ; Vol. 2017-August. pp. 2013-2017.
@article{588ac0cce91f4efbb981a519c94a4931,
title = "Speech enhancement using bayesianwavenet",
abstract = "In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. The clean speech is then reconstructed using the approach of overlap-Add, which is limited by its inherent performance upper bound. This paper presents a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech.",
keywords = "Bayesian Framework, Convolutional Neural Network, Model-Based, Speech Enhancement, Wavenet",
author = "Kaizhi Qian and Yang Zhang and Shiyu Chang and Xuesong Yang and Dinei Flor{\`e}ncio and Hasegawa-Johnson, {Mark Allan}",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-1672",
language = "English (US)",
volume = "2017-August",
pages = "2013--2017",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Speech enhancement using bayesianwavenet

AU - Qian, Kaizhi

AU - Zhang, Yang

AU - Chang, Shiyu

AU - Yang, Xuesong

AU - Florèncio, Dinei

AU - Hasegawa-Johnson, Mark Allan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. The clean speech is then reconstructed using the approach of overlap-Add, which is limited by its inherent performance upper bound. This paper presents a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech.

AB - In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. The clean speech is then reconstructed using the approach of overlap-Add, which is limited by its inherent performance upper bound. This paper presents a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech.

KW - Bayesian Framework

KW - Convolutional Neural Network

KW - Model-Based

KW - Speech Enhancement

KW - Wavenet

UR - http://www.scopus.com/inward/record.url?scp=85039147536&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039147536&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-1672

DO - 10.21437/Interspeech.2017-1672

M3 - Conference article

AN - SCOPUS:85039147536

VL - 2017-August

SP - 2013

EP - 2017

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -