TY - GEN
T1 - Joint learning of visual attributes, object classes and visual saliency
AU - Wang, Gang
AU - Forsyth, David
PY - 2009
Y1 - 2009
N2 - We present a method to learn visual attributes (eg. "red", "metal", "spotted") and object classes (eg. "car", "dress", "umbrella") together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.
AB - We present a method to learn visual attributes (eg. "red", "metal", "spotted") and object classes (eg. "car", "dress", "umbrella") together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.
UR - http://www.scopus.com/inward/record.url?scp=77953177673&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953177673&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2009.5459194
DO - 10.1109/ICCV.2009.5459194
M3 - Conference contribution
AN - SCOPUS:77953177673
SN - 9781424444205
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 537
EP - 544
BT - 2009 IEEE 12th International Conference on Computer Vision, ICCV 2009
T2 - 12th International Conference on Computer Vision, ICCV 2009
Y2 - 29 September 2009 through 2 October 2009
ER -