Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Joseph Tighe, Marc Niethammer, Svetlana Lazebnik

Research output: Contribution to journalArticlepeer-review

Abstract

This paper describes a system for interpreting a scene by assigning a semantic label at every pixel and inferring the spatial extent of individual object instances together with their occlusion relationships. First we present a method for labeling each pixel aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. This method combines region-level features with per-exemplar sliding window detectors. Unlike traditional bounding box detectors, per-exemplar detectors perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. Next, we use per-exemplar detections to generate a set of candidate object masks for a given test image. We then select a subset of objects that explain the image well and have valid overlap relationships and occlusion ordering. This is done by minimizing an integer quadratic program either using a greedy method or a standard solver. We alternate between using the object predictions to refine the pixel labels and using the pixel labels to improve the object predictions. The proposed system obtains promising results on two challenging subsets of the LabelMe dataset, the largest of which contains 45,676 images and 232 classes.

Original languageEnglish (US)
Pages (from-to)150-171
Number of pages22
JournalInternational Journal of Computer Vision
Volume112
Issue number2
DOIs
StatePublished - Apr 2015

Keywords

  • Image parsing
  • Object segmentation
  • Scene understanding
  • Semantic segmentation

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors'. Together they form a unique fingerprint.

Cite this