Counterfactual depth from a single RGB image

Theerasit Issaranon, Chuhang Zou, David Forsyth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a method that predicts, from a single RGB image, a depth map that describes the scene when a masked object is removed - we call this 'counterfactual depth' that models hidden scene geometry together with the observations. Our method works for the same reason that scene completion works: the spatial structure of objects is simple. But we offer a much higher resolution representation of space than current scene completion methods, as we operate at pixel-level precision and do not rely on a voxel representation. Furthermore, we do not require RGBD inputs. Our method uses a standard encoder-decoder architecture, and with a decoder modified to accept an object mask. We describe a small evaluation dataset that we have collected, which allows inference about what factors affect reconstruction most strongly. Using this dataset, we show that our depth predictions for masked objects are better than other baselines.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2129-2138
Number of pages10
ISBN (Electronic)9781728150239
DOIs
StatePublished - Oct 2019
Event17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of
Duration: Oct 27 2019Oct 28 2019

Publication series

NameProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019

Conference

Conference17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period10/27/1910/28/19

Keywords

  • Depth prediction
  • Object removal

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Counterfactual depth from a single RGB image'. Together they form a unique fingerprint.

Cite this