TY - GEN
T1 - Consistent Multimodal Generation via A Unified GAN Framework
AU - Zhu, Zhen
AU - Li, Yijun
AU - Lyu, Weijie
AU - Singh, Krishna Kumar
AU - Shu, Zhixin
AU - Pirk, Soren
AU - Hoiem, Derek
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/3
Y1 - 2024/1/3
N2 - We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at here.
AB - We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at here.
KW - 3D
KW - Algorithms
KW - Computational photography
KW - Generative models for image
KW - etc.
KW - image and video synthesis
KW - video
UR - http://www.scopus.com/inward/record.url?scp=85192008094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192008094&partnerID=8YFLogxK
U2 - 10.1109/WACV57701.2024.00497
DO - 10.1109/WACV57701.2024.00497
M3 - Conference contribution
AN - SCOPUS:85192008094
T3 - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
SP - 5036
EP - 5045
BT - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
Y2 - 4 January 2024 through 8 January 2024
ER -