TY - GEN
T1 - Street TryOn
T2 - 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
AU - Cui, Aiyu
AU - Mahajan, Jay
AU - Shah, Viraj
AU - Gomathinayagam, Preeti
AU - Liu, Chang
AU - Lazebnik, Svetlana
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a Street-TryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.
AB - Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a Street-TryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.
KW - diffusion models
KW - image generation
KW - virtual try-on
UR - http://www.scopus.com/inward/record.url?scp=105003631186&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105003631186&partnerID=8YFLogxK
U2 - 10.1109/WACV61041.2025.00145
DO - 10.1109/WACV61041.2025.00145
M3 - Conference contribution
AN - SCOPUS:105003631186
T3 - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
SP - 1414
EP - 1423
BT - Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 February 2025 through 4 March 2025
ER -