Learning to estimate reverberation time in noisy and reverberant rooms

Xiong Xiao, Shengkui Zhao, Xionghu Zhong, Douglas L Jones, Eng Siong Chng, Haizhou Li

Research output: Contribution to journalConference article

Abstract

The reverberation time, T60, is an important indicator of the reverberation strength in a room and has many applications in speech processing, such as dereverberation. However, the T60 must be blindly estimated if only reverberant speech is available. In this paper, we provide a learning based approach for T60 estimation. We treat the T60 estimation as a classification problem by dividing the T60 range into countable bins (e.g. 19 bins covering 0:1s to 1s with a bin width of 0:05s) and the estimation becomes predicting which bin the true T60 falls into for a given speech. We use deep neural networks (DNN) to learn such a mapping from speech to the T60. The DNN is trained on a large amount of reverberant and noisy speech signals generated from various simulated rooms with known reverberations. After training, we observe that the DNN can learn highly sensible features for the T60 estimation task. Experimental results on the data from both simulated rooms and real rooms confirmed the effectiveness of the DNN learning based approach. In all the test cases, the DNN method significantly outperforms the state-of-the-art SDD T60 estimation method.

Original languageEnglish (US)
Pages (from-to)3431-3435
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
StatePublished - Jan 1 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: Sep 6 2015Sep 10 2015

Fingerprint

Reverberation
Bins
Neural Networks
Estimate
Speech Processing
Speech processing
Speech Signal
Classification Problems
Countable
Covering
Learning
Deep neural networks
Experimental Results
Range of data
Speech

Keywords

  • Deep learning
  • Deep neural networks
  • Dereverberation
  • Robust reverberation time estimation
  • T60 estimation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Learning to estimate reverberation time in noisy and reverberant rooms. / Xiao, Xiong; Zhao, Shengkui; Zhong, Xionghu; Jones, Douglas L; Chng, Eng Siong; Li, Haizhou.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2015-January, 01.01.2015, p. 3431-3435.

Research output: Contribution to journalConference article

Xiao, Xiong ; Zhao, Shengkui ; Zhong, Xionghu ; Jones, Douglas L ; Chng, Eng Siong ; Li, Haizhou. / Learning to estimate reverberation time in noisy and reverberant rooms. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2015 ; Vol. 2015-January. pp. 3431-3435.
@article{449af47b7cd140869ecb34762726edb3,
title = "Learning to estimate reverberation time in noisy and reverberant rooms",
abstract = "The reverberation time, T60, is an important indicator of the reverberation strength in a room and has many applications in speech processing, such as dereverberation. However, the T60 must be blindly estimated if only reverberant speech is available. In this paper, we provide a learning based approach for T60 estimation. We treat the T60 estimation as a classification problem by dividing the T60 range into countable bins (e.g. 19 bins covering 0:1s to 1s with a bin width of 0:05s) and the estimation becomes predicting which bin the true T60 falls into for a given speech. We use deep neural networks (DNN) to learn such a mapping from speech to the T60. The DNN is trained on a large amount of reverberant and noisy speech signals generated from various simulated rooms with known reverberations. After training, we observe that the DNN can learn highly sensible features for the T60 estimation task. Experimental results on the data from both simulated rooms and real rooms confirmed the effectiveness of the DNN learning based approach. In all the test cases, the DNN method significantly outperforms the state-of-the-art SDD T60 estimation method.",
keywords = "Deep learning, Deep neural networks, Dereverberation, Robust reverberation time estimation, T60 estimation",
author = "Xiong Xiao and Shengkui Zhao and Xionghu Zhong and Jones, {Douglas L} and Chng, {Eng Siong} and Haizhou Li",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
volume = "2015-January",
pages = "3431--3435",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Learning to estimate reverberation time in noisy and reverberant rooms

AU - Xiao, Xiong

AU - Zhao, Shengkui

AU - Zhong, Xionghu

AU - Jones, Douglas L

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2015/1/1

Y1 - 2015/1/1

N2 - The reverberation time, T60, is an important indicator of the reverberation strength in a room and has many applications in speech processing, such as dereverberation. However, the T60 must be blindly estimated if only reverberant speech is available. In this paper, we provide a learning based approach for T60 estimation. We treat the T60 estimation as a classification problem by dividing the T60 range into countable bins (e.g. 19 bins covering 0:1s to 1s with a bin width of 0:05s) and the estimation becomes predicting which bin the true T60 falls into for a given speech. We use deep neural networks (DNN) to learn such a mapping from speech to the T60. The DNN is trained on a large amount of reverberant and noisy speech signals generated from various simulated rooms with known reverberations. After training, we observe that the DNN can learn highly sensible features for the T60 estimation task. Experimental results on the data from both simulated rooms and real rooms confirmed the effectiveness of the DNN learning based approach. In all the test cases, the DNN method significantly outperforms the state-of-the-art SDD T60 estimation method.

AB - The reverberation time, T60, is an important indicator of the reverberation strength in a room and has many applications in speech processing, such as dereverberation. However, the T60 must be blindly estimated if only reverberant speech is available. In this paper, we provide a learning based approach for T60 estimation. We treat the T60 estimation as a classification problem by dividing the T60 range into countable bins (e.g. 19 bins covering 0:1s to 1s with a bin width of 0:05s) and the estimation becomes predicting which bin the true T60 falls into for a given speech. We use deep neural networks (DNN) to learn such a mapping from speech to the T60. The DNN is trained on a large amount of reverberant and noisy speech signals generated from various simulated rooms with known reverberations. After training, we observe that the DNN can learn highly sensible features for the T60 estimation task. Experimental results on the data from both simulated rooms and real rooms confirmed the effectiveness of the DNN learning based approach. In all the test cases, the DNN method significantly outperforms the state-of-the-art SDD T60 estimation method.

KW - Deep learning

KW - Deep neural networks

KW - Dereverberation

KW - Robust reverberation time estimation

KW - T60 estimation

UR - http://www.scopus.com/inward/record.url?scp=84959161776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959161776&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84959161776

VL - 2015-January

SP - 3431

EP - 3435

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -