Every picture tells a story: Generating sentences from images

Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

Original languageEnglish (US)
Title of host publicationComputer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
PublisherSpringer-Verlag Berlin Heidelberg
Pages15-29
Number of pages15
EditionPART 4
ISBN (Print)364215560X, 9783642155604
DOIs
StatePublished - Jan 1 2010
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: Sep 10 2010Sep 11 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 4
Volume6314 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th European Conference on Computer Vision, ECCV 2010
CountryGreece
CityHeraklion, Crete
Period9/10/109/11/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Every picture tells a story: Generating sentences from images'. Together they form a unique fingerprint.

  • Cite this

    Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010). Every picture tells a story: Generating sentences from images. In Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings (PART 4 ed., pp. 15-29). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6314 LNCS, No. PART 4). Springer-Verlag Berlin Heidelberg. https://doi.org/10.1007/978-3-642-15561-1_2