TY - JOUR
T1 - Manhattan Room Layout Reconstruction from a Single 360 ∘ Image
T2 - A Comparative Study of State-of-the-Art Methods
AU - Zou, Chuhang
AU - Su, Jheng Wei
AU - Peng, Chi Han
AU - Colburn, Alex
AU - Shan, Qi
AU - Wonka, Peter
AU - Chu, Hung Kuo
AU - Hoiem, Derek
N1 - Funding Information:
This research is supported in part by ONR MURI Grant N00014-16-1-2007, iStaging Corp. fund and the Ministry of Science and Technology of Taiwan (108-2218-E-007-050- and 107-2221-E-007-088-MY3). We thank Shang-Ta Yang for providing the source code of DuLa-Net. We thank Cheng Sun for providing the source code of HorizonNet and help run experiments on our provided dataset.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature.
PY - 2021/5
Y1 - 2021/5
N2 - Recent approaches for predicting layouts from 360∘ panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g., SegNet or ResNet), type of elements predicted (e.g., corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset (Chang et al.: Matterport3d: learning from rgb-d data in indoor environments. arXiv:1709.06158, 2017), and introduce two depth-based evaluation metrics.
AB - Recent approaches for predicting layouts from 360∘ panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g., SegNet or ResNet), type of elements predicted (e.g., corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset (Chang et al.: Matterport3d: learning from rgb-d data in indoor environments. arXiv:1709.06158, 2017), and introduce two depth-based evaluation metrics.
KW - 3D room layout
KW - Deep learning
KW - Manhattan world
KW - Single image 3D
UR - http://www.scopus.com/inward/record.url?scp=85100805936&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100805936&partnerID=8YFLogxK
U2 - 10.1007/s11263-020-01426-8
DO - 10.1007/s11263-020-01426-8
M3 - Article
AN - SCOPUS:85100805936
SN - 0920-5691
VL - 129
SP - 1410
EP - 1431
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 5
ER -