Abstract
Deep neural network (DNN)-based speech enhancement models often face challenges in maintaining their performance for speakers not encountered during training. This challenge is exacerbated in applications such as enhancement and bandwidth extension of bone-conducted speech, where the distortion exhibits a close correlation with speaker-specific characteristics. We address this issue by introducing a bottleneck module aimed at disentangling speaker-specific characteristics from speech content in speech enhancement DNNs. A DNN model is trained for enhancement of bone-conducted speech and modified with the proposed bottleneck module. We evaluate the DNN’s adaptability to unseen speakers through fine-tuning the network with a limited amount of adaptation data. The results show that the proposed bottleneck module can enhance adaptation performance to new unseen speakers, especially when limited amount of speaker-specific adaptation data is available.
Original language | English (US) |
---|---|
Title of host publication | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | IEEE |
Pages | 10456-10460 |
Number of pages | 5 |
ISBN (Print) | 9798350344868 |
DOIs | |
State | Published - Apr 19 2024 |
Event | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Seoul, Korea, Republic of Duration: Apr 14 2024 → Apr 19 2024 |
Conference
Conference | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
---|---|
Period | 4/14/24 → 4/19/24 |
Keywords
- Training
- Adaptation models
- Acoustic distortion
- Correlation
- Bandwidth
- Artificial neural networks
- Speech enhancement