Every picture tells a story: Generating sentences from images

Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

Original languageEnglish (US)
Title of host publicationComputer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
PublisherSpringer
Pages15-29
Number of pages15
EditionPART 4
ISBN (Print)364215560X, 9783642155604
DOIs
StatePublished - 2010
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: Sep 10 2010Sep 11 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 4
Volume6314 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th European Conference on Computer Vision, ECCV 2010
Country/TerritoryGreece
CityHeraklion, Crete
Period9/10/109/11/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Every picture tells a story: Generating sentences from images'. Together they form a unique fingerprint.

Cite this