Every picture tells a story: Generating sentences from images

Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

Original languageEnglish (US)
Title of host publicationComputer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
PublisherSpringer-Verlag
Pages15-29
Number of pages15
EditionPART 4
ISBN (Print)364215560X, 9783642155604
DOIs
StatePublished - Jan 1 2010
Event11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece
Duration: Sep 10 2010Sep 11 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 4
Volume6314 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th European Conference on Computer Vision, ECCV 2010
CountryGreece
CityHeraklion, Crete
Period9/10/109/11/10

Fingerprint

Estimate
Linking
Narrative
Sufficient
Evaluate
Demonstrate
Meaning
Human

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010). Every picture tells a story: Generating sentences from images. In Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings (PART 4 ed., pp. 15-29). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6314 LNCS, No. PART 4). Springer-Verlag. https://doi.org/10.1007/978-3-642-15561-1_2

Every picture tells a story : Generating sentences from images. / Farhadi, Ali; Hejrati, Mohsen; Sadeghi, Mohammad Amin; Young, Peter; Rashtchian, Cyrus; Hockenmaier, Julia; Forsyth, David.

Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings. PART 4. ed. Springer-Verlag, 2010. p. 15-29 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6314 LNCS, No. PART 4).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Farhadi, A, Hejrati, M, Sadeghi, MA, Young, P, Rashtchian, C, Hockenmaier, J & Forsyth, D 2010, Every picture tells a story: Generating sentences from images. in Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings. PART 4 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 4, vol. 6314 LNCS, Springer-Verlag, pp. 15-29, 11th European Conference on Computer Vision, ECCV 2010, Heraklion, Crete, Greece, 9/10/10. https://doi.org/10.1007/978-3-642-15561-1_2
Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J et al. Every picture tells a story: Generating sentences from images. In Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings. PART 4 ed. Springer-Verlag. 2010. p. 15-29. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 4). https://doi.org/10.1007/978-3-642-15561-1_2
Farhadi, Ali ; Hejrati, Mohsen ; Sadeghi, Mohammad Amin ; Young, Peter ; Rashtchian, Cyrus ; Hockenmaier, Julia ; Forsyth, David. / Every picture tells a story : Generating sentences from images. Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings. PART 4. ed. Springer-Verlag, 2010. pp. 15-29 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 4).
@inproceedings{6d7effd016974b2b88c071353ad5db9b,
title = "Every picture tells a story: Generating sentences from images",
abstract = "Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.",
author = "Ali Farhadi and Mohsen Hejrati and Sadeghi, {Mohammad Amin} and Peter Young and Cyrus Rashtchian and Julia Hockenmaier and David Forsyth",
year = "2010",
month = "1",
day = "1",
doi = "10.1007/978-3-642-15561-1_2",
language = "English (US)",
isbn = "364215560X",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
number = "PART 4",
pages = "15--29",
booktitle = "Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings",
edition = "PART 4",

}

TY - GEN

T1 - Every picture tells a story

T2 - Generating sentences from images

AU - Farhadi, Ali

AU - Hejrati, Mohsen

AU - Sadeghi, Mohammad Amin

AU - Young, Peter

AU - Rashtchian, Cyrus

AU - Hockenmaier, Julia

AU - Forsyth, David

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

AB - Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.

UR - http://www.scopus.com/inward/record.url?scp=78149311145&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149311145&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-15561-1_2

DO - 10.1007/978-3-642-15561-1_2

M3 - Conference contribution

AN - SCOPUS:78149311145

SN - 364215560X

SN - 9783642155604

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 15

EP - 29

BT - Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings

PB - Springer-Verlag

ER -