TY - GEN
T1 - Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors
AU - Zhong, Yuanyi
AU - Bhattad, Anand
AU - Wang, Yu Xiong
AU - Forsyth, David
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Dense depth and surface normal predictors should possess the equivariant property to cropping-and-resizing - cropping the input image should result in cropping the same output image. However, we find that state-of-the-art depth and normal predictors, despite having strong performances, surprisingly do not respect equivariance. The problem exists even when crop-and-resize data augmentation is employed during training. To remedy this, we propose an equivariant regularization technique, consisting of an averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks. Our approach can be applied to both CNN and Transformer architectures, does not incur extra cost during testing, and notably improves the supervised and semi-supervised learning performance of dense predictors on Taskonomy tasks. Finally, finetuning with our loss on unlabeled images improves not only equivariance but also accuracy of state-of-the-art depth and normal predictors when evaluated on NYU-v2.
AB - Dense depth and surface normal predictors should possess the equivariant property to cropping-and-resizing - cropping the input image should result in cropping the same output image. However, we find that state-of-the-art depth and normal predictors, despite having strong performances, surprisingly do not respect equivariance. The problem exists even when crop-and-resize data augmentation is employed during training. To remedy this, we propose an equivariant regularization technique, consisting of an averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks. Our approach can be applied to both CNN and Transformer architectures, does not incur extra cost during testing, and notably improves the supervised and semi-supervised learning performance of dense predictors on Taskonomy tasks. Finally, finetuning with our loss on unlabeled images improves not only equivariance but also accuracy of state-of-the-art depth and normal predictors when evaluated on NYU-v2.
UR - http://www.scopus.com/inward/record.url?scp=85180397273&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85180397273&partnerID=8YFLogxK
U2 - 10.1109/ICCV51070.2023.01990
DO - 10.1109/ICCV51070.2023.01990
M3 - Conference contribution
AN - SCOPUS:85180397273
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 21718
EP - 21728
BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Y2 - 2 October 2023 through 6 October 2023
ER -