TY - GEN
T1 - SPEECHSPLIT2.0
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
AU - Chan, Chak Ho
AU - Qian, Kaizhi
AU - Zhang, Yang
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - SPEECHSPLIT can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SPEECHSPLIT requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SPEECHSPLIT2.0, which constrains the information flow of the speech component to be disentangled on the autoencoder input using efficient signal processing methods instead of bottleneck tuning. Evaluation results show that SPEECHSPLIT2.0 achieves comparable performance to SPEECHSPLIT in speech disentanglement and superior robustness to the bottleneck size variations. Our code is available at https://github.com/biggytruck/SpeechSplit2.
AB - SPEECHSPLIT can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SPEECHSPLIT requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SPEECHSPLIT2.0, which constrains the information flow of the speech component to be disentangled on the autoencoder input using efficient signal processing methods instead of bottleneck tuning. Evaluation results show that SPEECHSPLIT2.0 achieves comparable performance to SPEECHSPLIT in speech disentanglement and superior robustness to the bottleneck size variations. Our code is available at https://github.com/biggytruck/SpeechSplit2.
KW - Speech Disentanglement
KW - Unsupervised Learning
KW - Voice Conversion
UR - http://www.scopus.com/inward/record.url?scp=85134074715&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134074715&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9747763
DO - 10.1109/ICASSP43922.2022.9747763
M3 - Conference contribution
AN - SCOPUS:85134074715
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5243
EP - 5247
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 May 2022 through 27 May 2022
ER -