TY - GEN
T1 - 3D spatial recognition without spatially labeled 3D
AU - Ren, Zhongzheng
AU - Misra, Ishan
AU - Schwing, Alexander G.
AU - Girdhar, Rohit
N1 - Funding Information:
We propose WyPR, a novel framework for joint 3D semantic segmentation and object detection, trained using only scene-level class tags as supervision. It leverages a novel unsupervised 3D proposal generation approach (GSS) along with natural constraints between the segmentation and detection tasks. Through extensive experimentation on standard datasets we show WyPR outperforms single task baselines and prior state-of-the-art methods on both tasks. Acknowledgements. This work was supported in part by NSF under Grant #1718221, 2008387 and MRI #1725729, NIFA award 2020-67021-32799. The authors thank Zaiwei Zhang and the Facebook AI team for helpful discussions and feedback.
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses. We show that in conjunction with standard multiple-instance learning objectives, WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time. We demonstrate its efficacy using the ScanNet and S3DIS datasets, outperforming prior state of the art on weakly-supervised segmentation by more than 6% mIoU. In addition, we set up the first benchmark for weakly-supervised 3D object detection on both datasets, where WyPR outperforms standard approaches and establishes strong baselines for future work.
AB - We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses. We show that in conjunction with standard multiple-instance learning objectives, WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time. We demonstrate its efficacy using the ScanNet and S3DIS datasets, outperforming prior state of the art on weakly-supervised segmentation by more than 6% mIoU. In addition, we set up the first benchmark for weakly-supervised 3D object detection on both datasets, where WyPR outperforms standard approaches and establishes strong baselines for future work.
UR - http://www.scopus.com/inward/record.url?scp=85121413002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121413002&partnerID=8YFLogxK
U2 - 10.1109/CVPR46437.2021.01300
DO - 10.1109/CVPR46437.2021.01300
M3 - Conference contribution
AN - SCOPUS:85121413002
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 13199
EP - 13208
BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
PB - IEEE Computer Society
T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Y2 - 19 June 2021 through 25 June 2021
ER -