TY - GEN
T1 - TorontoCity
T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017
AU - Wang, Shenlong
AU - Bai, Min
AU - Mattyus, Gellert
AU - Chu, Hang
AU - Luo, Wenjie
AU - Yang, Bin
AU - Liang, Justin
AU - Cheverie, Joel
AU - Fidler, Sanja
AU - Urtasun, Raquel
N1 - Funding Information:
This work was performed at the University of Toronto and funded by the support from NSERC, CFI, ORF, ERA, CRC. We thank NVIDIA for donating GPUs.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/22
Y1 - 2017/12/22
N2 - In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712.5km2 of land, 8439km of road and around 400, 000 buildings. Our benchmark provides different perspectives of the world captured from airplanes, drones and cars driving around the city. Manually labeling such a large scale dataset is infeasible. Instead, we propose to utilize different sources of high-precision maps to create our ground truth. Towards this goal, we develop algorithms that allow us to align all data sources with the maps while requiring minimal human supervision. We have designed a wide variety of tasks including building height estimation (reconstruction), road centerline and curb extraction, building instance segmentation, building contour extraction (reorganization), semantic labeling and scene type classification (recognition). Our pilot study shows that most of these tasks are still difficult for modern convolutional neural networks.
AB - In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712.5km2 of land, 8439km of road and around 400, 000 buildings. Our benchmark provides different perspectives of the world captured from airplanes, drones and cars driving around the city. Manually labeling such a large scale dataset is infeasible. Instead, we propose to utilize different sources of high-precision maps to create our ground truth. Towards this goal, we develop algorithms that allow us to align all data sources with the maps while requiring minimal human supervision. We have designed a wide variety of tasks including building height estimation (reconstruction), road centerline and curb extraction, building instance segmentation, building contour extraction (reorganization), semantic labeling and scene type classification (recognition). Our pilot study shows that most of these tasks are still difficult for modern convolutional neural networks.
UR - http://www.scopus.com/inward/record.url?scp=85041915243&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041915243&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2017.327
DO - 10.1109/ICCV.2017.327
M3 - Conference contribution
AN - SCOPUS:85041915243
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 3028
EP - 3036
BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 October 2017 through 29 October 2017
ER -