TY - JOUR
T1 - Superparsing
T2 - Scalable nonparametric image parsing with superpixels
AU - Tighe, Joseph
AU - Lazebnik, Svetlana
N1 - Acknowledgements This research was supported in part by NSF grants IIS-0845629 and IIS-0916829, DARPA Computer Science Study Group, Microsoft Research Faculty Fellowship, and Xerox.
PY - 2013/1
Y1 - 2013/1
N2 - This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach is based on lazy learning, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. Given a test image, it first performs global scene-level matching against the training set, followed by superpixel-level matching and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 45,676 images and 232 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Finally, we present an extension of our method to video sequences and report results on a video dataset with frames densely labeled at 1 Hz.
AB - This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach is based on lazy learning, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. Given a test image, it first performs global scene-level matching against the training set, followed by superpixel-level matching and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 45,676 images and 232 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Finally, we present an extension of our method to video sequences and report results on a video dataset with frames densely labeled at 1 Hz.
KW - Image parsing
KW - Image segmentation
KW - Scene understanding
UR - http://www.scopus.com/inward/record.url?scp=84873190838&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84873190838&partnerID=8YFLogxK
U2 - 10.1007/s11263-012-0574-z
DO - 10.1007/s11263-012-0574-z
M3 - Article
AN - SCOPUS:84873190838
SN - 0920-5691
VL - 101
SP - 329
EP - 349
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 2
ER -