TY - GEN
T1 - Joint acoustic and spectral modeling for speech dereverberation using non-negative representations
AU - Mohammadiha, Nasser
AU - Smaragdis, Paris
AU - Doclo, Simon
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/4
Y1 - 2015/8/4
N2 - This paper proposes a single-channel speech dereverberation method enhancing the spectrum of the reverberant speech signal. The proposed method uses a non-negative approximation of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech signal and the room impulse response (RIR). To utilize the speech spectral structure, we propose to model the speech spectrum using non-negative matrix factorization, which is directly used in the N-CTF model resulting in a new cost function. We derive new estimators for the parameters by minimizing the obtained cost function. Additionally, to investigate the effect of the speech temporal dynamics for dereverberation, we use a frame stacking method and derive optimal estimators. Experiments are performed for two measured RIRs and the performance of the proposed method is compared to the performance of a state-of-the-art dereverberation method enhancing the speech spectrum. Experimental results show that the proposed method improved instrumental speech quality measures, where using speech temporal dynamics was found to be beneficial in severe reverberation conditions.
AB - This paper proposes a single-channel speech dereverberation method enhancing the spectrum of the reverberant speech signal. The proposed method uses a non-negative approximation of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech signal and the room impulse response (RIR). To utilize the speech spectral structure, we propose to model the speech spectrum using non-negative matrix factorization, which is directly used in the N-CTF model resulting in a new cost function. We derive new estimators for the parameters by minimizing the obtained cost function. Additionally, to investigate the effect of the speech temporal dynamics for dereverberation, we use a frame stacking method and derive optimal estimators. Experiments are performed for two measured RIRs and the performance of the proposed method is compared to the performance of a state-of-the-art dereverberation method enhancing the speech spectrum. Experimental results show that the proposed method improved instrumental speech quality measures, where using speech temporal dynamics was found to be beneficial in severe reverberation conditions.
KW - Non-negative convolutive transfer function
KW - dictionary-based processing
KW - non-negative matrix factorization
UR - http://www.scopus.com/inward/record.url?scp=84946075722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946075722&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2015.7178804
DO - 10.1109/ICASSP.2015.7178804
M3 - Conference contribution
AN - SCOPUS:84946075722
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4410
EP - 4414
BT - 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
Y2 - 19 April 2014 through 24 April 2014
ER -