TY - GEN
T1 - Collaborative speech dereverberation
T2 - 26th European Signal Processing Conference, EUSIPCO 2018
AU - Wager, Sanna
AU - Kim, Minje
N1 - Publisher Copyright:
© EURASIP 2018.
PY - 2018/11/29
Y1 - 2018/11/29
N2 - We propose a regularized nonnegative tensor factorization (NTF) model for multi-channel speech dereverberation that incorporates prior knowledge about clean speech. The approach models the problem as recovering a signal convolved with different room impulse responses, allowing the dereverberation problem to benefit from microphone arrays. The factorization learns both individual reverberation filters and channel-specific delays, which makes it possible to employ an ad-hoc microphone array with heterogeneous sensors (such as multi-channel recordings by a crowd) even if they are not synchronized. We integrate two prior-knowledge regularization schemes to increase the stability of dereverberation performance. First, a Nonnegative Matrix Factorization (NMF) inner routine is introduced to inform the original NTF problem of the pre-trained clean speech basis vectors, so that the optimization process can focus on estimating their activations rather than the whole clean speech spectra. Second, the NMF activation matrix is further regularized to take on characteristics of dry signals using sparsity and smoothness constraints. Empirical dereverberation results on different simulated reverberation setups show that the prior-knowledge regularization schemes improve both recovered sound quality and speech intelligibility compared to a baseline NTF approach.
AB - We propose a regularized nonnegative tensor factorization (NTF) model for multi-channel speech dereverberation that incorporates prior knowledge about clean speech. The approach models the problem as recovering a signal convolved with different room impulse responses, allowing the dereverberation problem to benefit from microphone arrays. The factorization learns both individual reverberation filters and channel-specific delays, which makes it possible to employ an ad-hoc microphone array with heterogeneous sensors (such as multi-channel recordings by a crowd) even if they are not synchronized. We integrate two prior-knowledge regularization schemes to increase the stability of dereverberation performance. First, a Nonnegative Matrix Factorization (NMF) inner routine is introduced to inform the original NTF problem of the pre-trained clean speech basis vectors, so that the optimization process can focus on estimating their activations rather than the whole clean speech spectra. Second, the NMF activation matrix is further regularized to take on characteristics of dry signals using sparsity and smoothness constraints. Empirical dereverberation results on different simulated reverberation setups show that the prior-knowledge regularization schemes improve both recovered sound quality and speech intelligibility compared to a baseline NTF approach.
KW - Collaborative audio enhancement
KW - multi-channel dereverberation
KW - Nonnegative matrix factorization
KW - Nonnegative tensor factorization
KW - Speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85059821003&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059821003&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO.2018.8553565
DO - 10.23919/EUSIPCO.2018.8553565
M3 - Conference contribution
AN - SCOPUS:85059821003
T3 - European Signal Processing Conference
SP - 1532
EP - 1536
BT - 2018 26th European Signal Processing Conference, EUSIPCO 2018
PB - European Signal Processing Conference, EUSIPCO
Y2 - 3 September 2018 through 7 September 2018
ER -