Sensor fusion for semantic segmentation of urban scenes

Richard Yi Zhang, Stefan A. Candra, Kai Vetter, Avideh Zakhor

Research output: Contribution to journalConference articlepeer-review


Semantic understanding of environments is an important problem in robotics in general and intelligent autonomous systems in particular. In this paper, we propose a semantic segmentation algorithm which effectively fuses information from images and 3D point clouds. The proposed method incorporates information from multiple scales in an intuitive and effective manner. A late-fusion architecture is proposed to maximally leverage the training data in each modality. Finally, a pairwise Conditional Random Field (CRF) is used as a post-processing step to enforce spatial consistency in the structured prediction. The proposed algorithm is evaluated on the publicly available KITTI dataset [1] [2], augmented with additional pixel and point-wise semantic labels for building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence regions. A per-pixel accuracy of 89.3% and average class accuracy of 65.4% is achieved, well above current state-of-the-art [3].

Original languageEnglish (US)
Article number7139439
Pages (from-to)1850-1857
Number of pages8
JournalProceedings - IEEE International Conference on Robotics and Automation
Issue numberJune
StatePublished - Jun 29 2015
Externally publishedYes
Event2015 IEEE International Conference on Robotics and Automation, ICRA 2015 - Seattle, United States
Duration: May 26 2015May 30 2015

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering


Dive into the research topics of 'Sensor fusion for semantic segmentation of urban scenes'. Together they form a unique fingerprint.

Cite this