TY - GEN
T1 - Improving image-sentence embeddings using large weakly annotated photo collections
AU - Gong, Yunchao
AU - Wang, Liwei
AU - Hodosh, Micah
AU - Hockenmaier, Julia
AU - Lazebnik, Svetlana
PY - 2014
Y1 - 2014
N2 - This paper studies the problem of associating images with descriptive sentences by embedding them in a common latent space. We are interested in learning such embeddings from hundreds of thousands or millions of examples. Unfortunately, it is prohibitively expensive to fully annotate this many training images with ground-truth sentences. Instead, we ask whether we can learn better image-sentence embeddings by augmenting small fully annotated training sets with millions of images that have weak and noisy annotations (titles, tags, or descriptions). After investigating several state-of-the-art scalable embedding methods, we introduce a new algorithm called Stacked Auxiliary Embedding that can successfully transfer knowledge from millions of weakly annotated images to improve the accuracy of retrieval-based image description.
AB - This paper studies the problem of associating images with descriptive sentences by embedding them in a common latent space. We are interested in learning such embeddings from hundreds of thousands or millions of examples. Unfortunately, it is prohibitively expensive to fully annotate this many training images with ground-truth sentences. Instead, we ask whether we can learn better image-sentence embeddings by augmenting small fully annotated training sets with millions of images that have weak and noisy annotations (titles, tags, or descriptions). After investigating several state-of-the-art scalable embedding methods, we introduce a new algorithm called Stacked Auxiliary Embedding that can successfully transfer knowledge from millions of weakly annotated images to improve the accuracy of retrieval-based image description.
UR - http://www.scopus.com/inward/record.url?scp=84906484732&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906484732&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-10593-2_35
DO - 10.1007/978-3-319-10593-2_35
M3 - Conference contribution
AN - SCOPUS:84906484732
SN - 9783319105925
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 529
EP - 545
BT - Computer Vision, ECCV 2014 - 13th European Conference, Proceedings
PB - Springer
T2 - 13th European Conference on Computer Vision, ECCV 2014
Y2 - 6 September 2014 through 12 September 2014
ER -