TY - GEN
T1 - Discovering Objects that Can Move
AU - Bao, Zhipeng
AU - Tokmakov, Pavel
AU - Jabri, Allan
AU - Wang, Yu Xiong
AU - Gaidon, Adrien
AU - Hebert, Martial
N1 - Funding Information:
Acknowledgements. We thank Alexei Efros, Vitor Guizilini and Jie Li for their valuable comments, and Achal Dave for his help with computing motion segmentations. This research was supported by Toyota Research Institute.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This paper studies the problem of object discovery - separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. However, by relying on appearance alone, these methods fail to separate objects from the background in cluttered scenes. This is a fundamental limitation since the definition of an object is inherently ambiguous and context-dependent. To resolve this ambiguity, we choose to focus on dynamic objects - entities that can move independently in the world. We then scale the recent auto-encoder based frameworks for unsuper-vised object discovery from toy synthetic images to complex real-world scenes. To this end, we simplify their architecture, and augment the resulting model with a weak learning signal from general motion segmentation algorithms. Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects. We show that our model scales to a newly collected, photo- realistic synthetic dataset with street driving scenarios. Additionally, we leverage ground truth segmentation and flow annotations in this dataset for thorough ablation and evaluation. Finally, our experiments on the real-world KITTI benchmark demonstrate that the proposed approach outperforms both heuristic- and learning-based methods by capitalizing on motion cues.
AB - This paper studies the problem of object discovery - separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. However, by relying on appearance alone, these methods fail to separate objects from the background in cluttered scenes. This is a fundamental limitation since the definition of an object is inherently ambiguous and context-dependent. To resolve this ambiguity, we choose to focus on dynamic objects - entities that can move independently in the world. We then scale the recent auto-encoder based frameworks for unsuper-vised object discovery from toy synthetic images to complex real-world scenes. To this end, we simplify their architecture, and augment the resulting model with a weak learning signal from general motion segmentation algorithms. Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects. We show that our model scales to a newly collected, photo- realistic synthetic dataset with street driving scenarios. Additionally, we leverage ground truth segmentation and flow annotations in this dataset for thorough ablation and evaluation. Finally, our experiments on the real-world KITTI benchmark demonstrate that the proposed approach outperforms both heuristic- and learning-based methods by capitalizing on motion cues.
KW - Motion and tracking
KW - Representation learning
KW - Segmentation
KW - Self-& semi-& meta- Video analysis and understanding
KW - grouping and shape analysis
UR - http://www.scopus.com/inward/record.url?scp=85135486694&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135486694&partnerID=8YFLogxK
U2 - 10.1109/CVPR52688.2022.01149
DO - 10.1109/CVPR52688.2022.01149
M3 - Conference contribution
AN - SCOPUS:85135486694
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 11779
EP - 11788
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Y2 - 19 June 2022 through 24 June 2022
ER -