Abstract
This chapter conducts an in-depth exploration of the various methodologies employed in modelling and learning representations for dynamic 3D scenes. The discussion draws upon the intrinsic human ability to interact with and predict changes within the 3D physical world, with the aim of advancing embodied AI systems to reach and exceed these capabilities.
The focus lies on the examination of diverse 3D representation methods, encompassing keypoints, particles, and neural fields among others. Their respective strengths, limitations, and applications within embodied AI tasks are thoroughly investigated. Highlights include the self-supervised learning of 3D object keypoints for model-based predictions, the use of particles in conjunction with graph neural networks (GNNs) to model complex dynamics, and the recent surge in the utilisation of 3D neural fields in world models, using inductive bias.
This chapter demonstrates how these representation learning methods have been instrumental in accomplishing complex control and planning tasks in the 3D world, thus significantly propelling AI capabilities in both simulated and real-world conditions. Concluding with a look towards the future, this chapter outlines potential research directions to continue enhancing the performance and capabilities of embodied intelligence systems.
The focus lies on the examination of diverse 3D representation methods, encompassing keypoints, particles, and neural fields among others. Their respective strengths, limitations, and applications within embodied AI tasks are thoroughly investigated. Highlights include the self-supervised learning of 3D object keypoints for model-based predictions, the use of particles in conjunction with graph neural networks (GNNs) to model complex dynamics, and the recent surge in the utilisation of 3D neural fields in world models, using inductive bias.
This chapter demonstrates how these representation learning methods have been instrumental in accomplishing complex control and planning tasks in the 3D world, thus significantly propelling AI capabilities in both simulated and real-world conditions. Concluding with a look towards the future, this chapter outlines potential research directions to continue enhancing the performance and capabilities of embodied intelligence systems.
Original language | English (US) |
---|---|
Title of host publication | Deep Learning for 3D Vision |
Subtitle of host publication | Algorithms and Applications |
Editors | Xiaoli Li, Xulei Yang, Hao Su |
Publisher | World Scientific |
Pages | 91-158 |
Number of pages | 68 |
ISBN (Electronic) | 9789811286490 |
ISBN (Print) | 9789811286483 |
DOIs | |
State | Published - Sep 2024 |