TY - GEN
T1 - Deep learning for monaural speech separation
AU - Huang, Po Sen
AU - Kim, Minje
AU - Hasegawa-Johnson, Mark
AU - Smaragdis, Paris
PY - 2014
Y1 - 2014
N2 - Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our approaches using the TIMIT speech corpus for a monaural speech separation task. Our proposed models achieve about 3.8∼4.9 dB SIR gain compared to NMF models, while maintaining better SDRs and SARs.
AB - Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our approaches using the TIMIT speech corpus for a monaural speech separation task. Our proposed models achieve about 3.8∼4.9 dB SIR gain compared to NMF models, while maintaining better SDRs and SARs.
KW - Deep Learning
KW - Monaural Source Separation
KW - Time-Frequency Masking
UR - http://www.scopus.com/inward/record.url?scp=84905240926&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905240926&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2014.6853860
DO - 10.1109/ICASSP.2014.6853860
M3 - Conference contribution
AN - SCOPUS:84905240926
SN - 9781479928927
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1562
EP - 1566
BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Y2 - 4 May 2014 through 9 May 2014
ER -