Cross-caption coreference resolution for automatic image understanding

Micah Hodosh, Peter Young, Cyrus Rashtchian, Julia Hockenmaier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent work in computer vision has aimed to associate image regions with keywords describing the depicted entities, but actual image 'understanding' would also require identifying their attributes, relations and activities. Since this information cannot be conveyed by simple keywords, we have collected a corpus of "action" photos each associated with five descriptive captions. In order to obtain a consistent semantic representation for each image, we need to first identify which NPs refer to the same entities. We present three hierarchical Bayesian models for cross-caption coreference resolution. We have also created a simple ontology of entity classes that appear in images and evaluate how well these can be recovered.

Original languageEnglish (US)
Title of host publicationCoNLL 2010 - Fourteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
Pages162-171
Number of pages10
StatePublished - 2010
Event14th Conference on Computational Natural Language Learning, CoNLL 2010 - Uppsala, Sweden
Duration: Jul 15 2010Jul 16 2010

Publication series

NameCoNLL 2010 - Fourteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

Other

Other14th Conference on Computational Natural Language Learning, CoNLL 2010
Country/TerritorySweden
CityUppsala
Period7/15/107/16/10

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Cross-caption coreference resolution for automatic image understanding'. Together they form a unique fingerprint.

Cite this