Improving image-sentence embeddings using large weakly annotated photo collections

Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, Svetlana Lazebnik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies the problem of associating images with descriptive sentences by embedding them in a common latent space. We are interested in learning such embeddings from hundreds of thousands or millions of examples. Unfortunately, it is prohibitively expensive to fully annotate this many training images with ground-truth sentences. Instead, we ask whether we can learn better image-sentence embeddings by augmenting small fully annotated training sets with millions of images that have weak and noisy annotations (titles, tags, or descriptions). After investigating several state-of-the-art scalable embedding methods, we introduce a new algorithm called Stacked Auxiliary Embedding that can successfully transfer knowledge from millions of weakly annotated images to improve the accuracy of retrieval-based image description.

Original languageEnglish (US)
Title of host publicationComputer Vision, ECCV 2014 - 13th European Conference, Proceedings
PublisherSpringer-Verlag
Pages529-545
Number of pages17
EditionPART 4
ISBN (Print)9783319105925
DOIs
StatePublished - Jan 1 2014
Event13th European Conference on Computer Vision, ECCV 2014 - Zurich, Switzerland
Duration: Sep 6 2014Sep 12 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 4
Volume8692 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th European Conference on Computer Vision, ECCV 2014
CountrySwitzerland
CityZurich
Period9/6/149/12/14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Improving image-sentence embeddings using large weakly annotated photo collections'. Together they form a unique fingerprint.

Cite this