Abstract
The reverberation time, T60, is an important indicator of the reverberation strength in a room and has many applications in speech processing, such as dereverberation. However, the T60 must be blindly estimated if only reverberant speech is available. In this paper, we provide a learning based approach for T60 estimation. We treat the T60 estimation as a classification problem by dividing the T60 range into countable bins (e.g. 19 bins covering 0:1s to 1s with a bin width of 0:05s) and the estimation becomes predicting which bin the true T60 falls into for a given speech. We use deep neural networks (DNN) to learn such a mapping from speech to the T60. The DNN is trained on a large amount of reverberant and noisy speech signals generated from various simulated rooms with known reverberations. After training, we observe that the DNN can learn highly sensible features for the T60 estimation task. Experimental results on the data from both simulated rooms and real rooms confirmed the effectiveness of the DNN learning based approach. In all the test cases, the DNN method significantly outperforms the state-of-the-art SDD T60 estimation method.
Original language | English (US) |
---|---|
Pages (from-to) | 3431-3435 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
State | Published - 2015 |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: Sep 6 2015 → Sep 10 2015 |
Keywords
- Deep learning
- Deep neural networks
- Dereverberation
- Robust reverberation time estimation
- T60 estimation
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation