TY - GEN
T1 - Standing Between Past and Future
T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
AU - Pang, Ziqi
AU - Li, Jie
AU - Tokmakov, Pavel
AU - Chen, Dian
AU - Zagoruyko, Sergey
AU - Wang, Yu Xiong
N1 - Acknowledgement. This work was supported in part by Toyota Research Institute, NSF Grant 2106825, NIFA Award 2020-67021-32799, and the NCSA Fellows program.
PY - 2023
Y1 - 2023
N2 - This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it 'Past- and-Future reasoning for Tracking' (PF-Track). Specifically, our method adopts the 'tracking by attention' framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our 'Past Reasoning' module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The 'Future Reasoning' module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.
AB - This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it 'Past- and-Future reasoning for Tracking' (PF-Track). Specifically, our method adopts the 'tracking by attention' framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our 'Past Reasoning' module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The 'Future Reasoning' module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.
KW - Autonomous driving
UR - http://www.scopus.com/inward/record.url?scp=85164428828&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164428828&partnerID=8YFLogxK
U2 - 10.1109/CVPR52729.2023.01719
DO - 10.1109/CVPR52729.2023.01719
M3 - Conference contribution
AN - SCOPUS:85164428828
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 17928
EP - 17938
BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
PB - IEEE Computer Society
Y2 - 18 June 2023 through 22 June 2023
ER -