TY - GEN
T1 - UPMIXING VIA STYLE TRANSFER
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
AU - Yang, Haici
AU - Wager, Sanna
AU - Russell, Spencer
AU - Luo, Mike
AU - Kim, Minje
AU - Kim, Wontak
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned latent variables are invariant to the music. At test time, we use the latent variables to control the panning of sources. We propose two upmixing use cases: transferring the spatial images from one song to another and blind panning based on the generative model. We report objective and subjective evaluation results to empirically show that our model captures spatial images separately from music content and achieves transfer-based interactive panning.
AB - In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned latent variables are invariant to the music. At test time, we use the latent variables to control the panning of sources. We propose two upmixing use cases: transferring the spatial images from one song to another and blind panning based on the generative model. We report objective and subjective evaluation results to empirically show that our model captures spatial images separately from music content and achieves transfer-based interactive panning.
KW - information disentanglement
KW - panning
KW - Stereo-to-multichannel upmixing
KW - variational autoencoders
UR - http://www.scopus.com/inward/record.url?scp=85131232604&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131232604&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746978
DO - 10.1109/ICASSP43922.2022.9746978
M3 - Conference contribution
AN - SCOPUS:85131232604
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 426
EP - 430
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 May 2022 through 27 May 2022
ER -