Words and pictures: Categories, modifiers, depiction, and iconography

David Alexander Forsyth, Tamara Berg, Cecilia Ovesdotter Alm, Ali Farhadi, Julia Constanze Hockenmaier, Nicolas Loeff, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Collections of digital pictures are now very common. Collections can range from a small set of family pictures, to the entire contents of a picture site like Flickr. Such collections differ from what one might see if one simply attached a camera to a robot and recorded everything, because the pictures have been selected by people. They are not necessarily “good” pictures (say, by standards of photographic aesthetics), but, because they have been chosen, they display quite strong trends. It is common for such pictures to have associated text, which might be keywords or tags but is often in the form of sentences or brief paragraphs. Text could be a caption (a set of remarks explicitly bound to the picture, and often typeset in a way that emphasizes this), region labels (terms associated with image regions, perhaps identifying what is in that region), annotations (terms associated with the whole picture, often identifying objects in the picture), or just nearby text. We review a series of ideas about how to exploit associated text to help interpret pictures. Word Frequencies, Objects, and Scenes Most pictures in electronic form seem to have related words nearby (or sound or metadata, and so on; we focus on words), so it is easy to collect word and picture datasets, and there are many examples. Such multimode collections should probably be seen as the usual case, because one usually has to deliberately ignore information to collect only images.

Original languageEnglish (US)
Title of host publicationObject Categorization
Subtitle of host publicationComputer and Human Vision Perspectives
PublisherCambridge University Press
Pages167-181
Number of pages15
Volume9780521887380
ISBN (Electronic)9780511635465
ISBN (Print)9780521887380
DOIs
StatePublished - Jan 1 2009

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Words and pictures: Categories, modifiers, depiction, and iconography'. Together they form a unique fingerprint.

  • Cite this

    Forsyth, D. A., Berg, T., Alm, C. O., Farhadi, A., Hockenmaier, J. C., Loeff, N., & Wang, G. (2009). Words and pictures: Categories, modifiers, depiction, and iconography. In Object Categorization: Computer and Human Vision Perspectives (Vol. 9780521887380, pp. 167-181). Cambridge University Press. https://doi.org/10.1017/CBO9780511635465.010