TY - GEN
T1 - Embracing Single Stride 3D Object Detector with Sparse Transformer
AU - Fan, Lue
AU - Pang, Ziqi
AU - Zhang, Tianyuan
AU - Wang, Yu Xiong
AU - Zhao, Hang
AU - Wang, Feng
AU - Wang, Naiyan
AU - Zhang, Zhaoxiang
N1 - This work was supported in part by the Major Project for New Generation of AI (No.2018AAA0100400), the National Natural Science Foundation of China (No. 61836014, No. U21B2042, No. 62072457, No. 62006231), and in part by the TuSimple Collaborative Research Project.
PY - 2022
Y1 - 2022
N2 - In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon.
AB - In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon.
KW - 3D from multi-view and sensors
KW - Navigation and autonomous driving
UR - http://www.scopus.com/inward/record.url?scp=85131016512&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131016512&partnerID=8YFLogxK
U2 - 10.1109/CVPR52688.2022.00827
DO - 10.1109/CVPR52688.2022.00827
M3 - Conference contribution
AN - SCOPUS:85131016512
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 8448
EP - 8458
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Y2 - 19 June 2022 through 24 June 2022
ER -