Joint learning of visual attributes, object classes and visual saliency

Gang Wang, David Forsyth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a method to learn visual attributes (eg. "red", "metal", "spotted") and object classes (eg. "car", "dress", "umbrella") together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.

Original languageEnglish (US)
Title of host publication2009 IEEE 12th International Conference on Computer Vision, ICCV 2009
Pages537-544
Number of pages8
DOIs
StatePublished - Dec 1 2009
Event12th International Conference on Computer Vision, ICCV 2009 - Kyoto, Japan
Duration: Sep 29 2009Oct 2 2009

Publication series

NameProceedings of the IEEE International Conference on Computer Vision

Other

Other12th International Conference on Computer Vision, ICCV 2009
CountryJapan
CityKyoto
Period9/29/0910/2/09

Fingerprint

Railroad cars
Color
Detectors
Metals

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Cite this

Wang, G., & Forsyth, D. (2009). Joint learning of visual attributes, object classes and visual saliency. In 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009 (pp. 537-544). [5459194] (Proceedings of the IEEE International Conference on Computer Vision). https://doi.org/10.1109/ICCV.2009.5459194

Joint learning of visual attributes, object classes and visual saliency. / Wang, Gang; Forsyth, David.

2009 IEEE 12th International Conference on Computer Vision, ICCV 2009. 2009. p. 537-544 5459194 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, G & Forsyth, D 2009, Joint learning of visual attributes, object classes and visual saliency. in 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009., 5459194, Proceedings of the IEEE International Conference on Computer Vision, pp. 537-544, 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, 9/29/09. https://doi.org/10.1109/ICCV.2009.5459194
Wang G, Forsyth D. Joint learning of visual attributes, object classes and visual saliency. In 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009. 2009. p. 537-544. 5459194. (Proceedings of the IEEE International Conference on Computer Vision). https://doi.org/10.1109/ICCV.2009.5459194
Wang, Gang ; Forsyth, David. / Joint learning of visual attributes, object classes and visual saliency. 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009. 2009. pp. 537-544 (Proceedings of the IEEE International Conference on Computer Vision).
@inproceedings{a2eed715cde84eb5a81b368417563f89,
title = "Joint learning of visual attributes, object classes and visual saliency",
abstract = "We present a method to learn visual attributes (eg. {"}red{"}, {"}metal{"}, {"}spotted{"}) and object classes (eg. {"}car{"}, {"}dress{"}, {"}umbrella{"}) together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.",
author = "Gang Wang and David Forsyth",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/ICCV.2009.5459194",
language = "English (US)",
isbn = "9781424444205",
series = "Proceedings of the IEEE International Conference on Computer Vision",
pages = "537--544",
booktitle = "2009 IEEE 12th International Conference on Computer Vision, ICCV 2009",

}

TY - GEN

T1 - Joint learning of visual attributes, object classes and visual saliency

AU - Wang, Gang

AU - Forsyth, David

PY - 2009/12/1

Y1 - 2009/12/1

N2 - We present a method to learn visual attributes (eg. "red", "metal", "spotted") and object classes (eg. "car", "dress", "umbrella") together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.

AB - We present a method to learn visual attributes (eg. "red", "metal", "spotted") and object classes (eg. "car", "dress", "umbrella") together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.

UR - http://www.scopus.com/inward/record.url?scp=77953177673&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953177673&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2009.5459194

DO - 10.1109/ICCV.2009.5459194

M3 - Conference contribution

AN - SCOPUS:77953177673

SN - 9781424444205

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 537

EP - 544

BT - 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009

ER -