Consistent Multimodal Generation via A Unified GAN Framework

Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soren Pirk, Derek Hoiem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at here.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5036-5045
Number of pages10
ISBN (Electronic)9798350318920
DOIs
StatePublished - Jan 3 2024
Event2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 - Waikoloa, United States
Duration: Jan 4 2024Jan 8 2024

Publication series

NameProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024

Conference

Conference2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
Country/TerritoryUnited States
CityWaikoloa
Period1/4/241/8/24

Keywords

  • 3D
  • Algorithms
  • Computational photography
  • Generative models for image
  • etc.
  • image and video synthesis
  • video

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Consistent Multimodal Generation via A Unified GAN Framework'. Together they form a unique fingerprint.

Cite this