Words and pictures: Categories, modifiers, depiction, and iconography

D. A. Forsyth, Tamara Berg, Cecilia Ovesdotter Alm, Ali Farhadi, Julia Hockenmaier, Nicolas Loeff, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingChapter


Collections of digital pictures are now very common. Collections can range from a small set of family pictures, to the entire contents of a picture site like Flickr. Such collections differ from what one might see if one simply attached a camera to a robot and recorded everything, because the pictures have been selected by people. They are not necessarily “good” pictures (say, by standards of photographic aesthetics), but, because they have been chosen, they display quite strong trends. It is common for such pictures to have associated text, which might be keywords or tags but is often in the form of sentences or brief paragraphs. Text could be a caption (a set of remarks explicitly bound to the picture, and often typeset in a way that emphasizes this), region labels (terms associated with image regions, perhaps identifying what is in that region), annotations (terms associated with the whole picture, often identifying objects in the picture), or just nearby text. We review a series of ideas about how to exploit associated text to help interpret pictures. Word Frequencies, Objects, and Scenes Most pictures in electronic form seem to have related words nearby (or sound or metadata, and so on; we focus on words), so it is easy to collect word and picture datasets, and there are many examples. Such multimode collections should probably be seen as the usual case, because one usually has to deliberately ignore information to collect only images.

Original languageEnglish (US)
Title of host publicationObject Categorization
Subtitle of host publicationComputer and Human Vision Perspectives
PublisherCambridge University Press
Number of pages15
ISBN (Electronic)9780511635465
ISBN (Print)9780521887380
StatePublished - Jan 1 2009

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Words and pictures: Categories, modifiers, depiction, and iconography'. Together they form a unique fingerprint.

Cite this