TY - GEN
T1 - Deep learning based speech beamforming
AU - Qian, Kaizhi
AU - Zhang, Yang
AU - Chang, Shiyu
AU - Yang, Xuesong
AU - Florencio, Dinei
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DEEPBEAM, which combines the two complementary classes of algorithms. DEEPBEAM introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that Deepbeam is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.
AB - Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DEEPBEAM, which combines the two complementary classes of algorithms. DEEPBEAM introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that Deepbeam is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.
KW - Ad-hoc sensors
KW - Beamforming
KW - Deep learning
KW - Multi-channel speech enhancement
KW - WaveNet
UR - http://www.scopus.com/inward/record.url?scp=85054203296&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054203296&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8462430
DO - 10.1109/ICASSP.2018.8462430
M3 - Conference contribution
AN - SCOPUS:85054203296
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5389
EP - 5393
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -