Visual scenes are categorized by function

Michelle R. Greene, Christopher Baldassano, Andre Esteva, Diane M. Beck, Li Fei-Fei

Research output: Contribution to journalArticle

Abstract

How do we know that a kitchen is a kitchen by looking? Traditional models posit that scene categorization is achieved through recognizing necessary and sufficient features and objects, yet there is little consensus about what these may be. However, scene categories should reflect how we use visual information. Therefore, we test the hypothesis that scene categories reflect functions, or the possibilities for actions within a scene. Our approach is to compare human categorization patterns with predictions made by both functions and alternative models. We collected a large-scale scene category distance matrix (5 million trials) by asking observers to simply decide whether 2 images were from the same or different categories. Using the actions from the American Time Use Survey, we mapped actions onto each scene (1.4 million trials). We found a strong relationship between ranked category distance and functional distance (r =.50, or 66% of the maximum possible correlation). The function model outperformed alternative models of object-based distance (r =.33), visual features from a convolutional neural network (r =.39), lexical distance (r =.27), and models of visual features. Using hierarchical linear regression, we found that functions captured 85.5% of overall explained variance, with nearly half of the explained variance captured only by functions, implying that the predictive power of alternative models was because of their shared variance with the function-based model. These results challenge the dominant school of thought that visual features and objects are sufficient for scene categorization, suggesting instead that a scene's category may be determined by the scene's function.

Original languageEnglish (US)
Pages (from-to)82-94
Number of pages13
JournalJournal of Experimental Psychology: General
Volume145
Issue number1
DOIs
StatePublished - Jan 1 2016

Fingerprint

Linear Models
Surveys and Questionnaires

Keywords

  • Categorization
  • Scene understanding
  • Similarity

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Psychology(all)
  • Developmental Neuroscience

Cite this

Visual scenes are categorized by function. / Greene, Michelle R.; Baldassano, Christopher; Esteva, Andre; Beck, Diane M.; Fei-Fei, Li.

In: Journal of Experimental Psychology: General, Vol. 145, No. 1, 01.01.2016, p. 82-94.

Research output: Contribution to journalArticle

Greene, MR, Baldassano, C, Esteva, A, Beck, DM & Fei-Fei, L 2016, 'Visual scenes are categorized by function', Journal of Experimental Psychology: General, vol. 145, no. 1, pp. 82-94. https://doi.org/10.1037/xge0000129
Greene, Michelle R. ; Baldassano, Christopher ; Esteva, Andre ; Beck, Diane M. ; Fei-Fei, Li. / Visual scenes are categorized by function. In: Journal of Experimental Psychology: General. 2016 ; Vol. 145, No. 1. pp. 82-94.
@article{2dd1894a03144ac387c9074cfbb15a91,
title = "Visual scenes are categorized by function",
abstract = "How do we know that a kitchen is a kitchen by looking? Traditional models posit that scene categorization is achieved through recognizing necessary and sufficient features and objects, yet there is little consensus about what these may be. However, scene categories should reflect how we use visual information. Therefore, we test the hypothesis that scene categories reflect functions, or the possibilities for actions within a scene. Our approach is to compare human categorization patterns with predictions made by both functions and alternative models. We collected a large-scale scene category distance matrix (5 million trials) by asking observers to simply decide whether 2 images were from the same or different categories. Using the actions from the American Time Use Survey, we mapped actions onto each scene (1.4 million trials). We found a strong relationship between ranked category distance and functional distance (r =.50, or 66{\%} of the maximum possible correlation). The function model outperformed alternative models of object-based distance (r =.33), visual features from a convolutional neural network (r =.39), lexical distance (r =.27), and models of visual features. Using hierarchical linear regression, we found that functions captured 85.5{\%} of overall explained variance, with nearly half of the explained variance captured only by functions, implying that the predictive power of alternative models was because of their shared variance with the function-based model. These results challenge the dominant school of thought that visual features and objects are sufficient for scene categorization, suggesting instead that a scene's category may be determined by the scene's function.",
keywords = "Categorization, Scene understanding, Similarity",
author = "Greene, {Michelle R.} and Christopher Baldassano and Andre Esteva and Beck, {Diane M.} and Li Fei-Fei",
year = "2016",
month = "1",
day = "1",
doi = "10.1037/xge0000129",
language = "English (US)",
volume = "145",
pages = "82--94",
journal = "Journal of Experimental Psychology: General",
issn = "0096-3445",
publisher = "American Psychological Association Inc.",
number = "1",

}

TY - JOUR

T1 - Visual scenes are categorized by function

AU - Greene, Michelle R.

AU - Baldassano, Christopher

AU - Esteva, Andre

AU - Beck, Diane M.

AU - Fei-Fei, Li

PY - 2016/1/1

Y1 - 2016/1/1

N2 - How do we know that a kitchen is a kitchen by looking? Traditional models posit that scene categorization is achieved through recognizing necessary and sufficient features and objects, yet there is little consensus about what these may be. However, scene categories should reflect how we use visual information. Therefore, we test the hypothesis that scene categories reflect functions, or the possibilities for actions within a scene. Our approach is to compare human categorization patterns with predictions made by both functions and alternative models. We collected a large-scale scene category distance matrix (5 million trials) by asking observers to simply decide whether 2 images were from the same or different categories. Using the actions from the American Time Use Survey, we mapped actions onto each scene (1.4 million trials). We found a strong relationship between ranked category distance and functional distance (r =.50, or 66% of the maximum possible correlation). The function model outperformed alternative models of object-based distance (r =.33), visual features from a convolutional neural network (r =.39), lexical distance (r =.27), and models of visual features. Using hierarchical linear regression, we found that functions captured 85.5% of overall explained variance, with nearly half of the explained variance captured only by functions, implying that the predictive power of alternative models was because of their shared variance with the function-based model. These results challenge the dominant school of thought that visual features and objects are sufficient for scene categorization, suggesting instead that a scene's category may be determined by the scene's function.

AB - How do we know that a kitchen is a kitchen by looking? Traditional models posit that scene categorization is achieved through recognizing necessary and sufficient features and objects, yet there is little consensus about what these may be. However, scene categories should reflect how we use visual information. Therefore, we test the hypothesis that scene categories reflect functions, or the possibilities for actions within a scene. Our approach is to compare human categorization patterns with predictions made by both functions and alternative models. We collected a large-scale scene category distance matrix (5 million trials) by asking observers to simply decide whether 2 images were from the same or different categories. Using the actions from the American Time Use Survey, we mapped actions onto each scene (1.4 million trials). We found a strong relationship between ranked category distance and functional distance (r =.50, or 66% of the maximum possible correlation). The function model outperformed alternative models of object-based distance (r =.33), visual features from a convolutional neural network (r =.39), lexical distance (r =.27), and models of visual features. Using hierarchical linear regression, we found that functions captured 85.5% of overall explained variance, with nearly half of the explained variance captured only by functions, implying that the predictive power of alternative models was because of their shared variance with the function-based model. These results challenge the dominant school of thought that visual features and objects are sufficient for scene categorization, suggesting instead that a scene's category may be determined by the scene's function.

KW - Categorization

KW - Scene understanding

KW - Similarity

UR - http://www.scopus.com/inward/record.url?scp=84952802314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952802314&partnerID=8YFLogxK

U2 - 10.1037/xge0000129

DO - 10.1037/xge0000129

M3 - Article

C2 - 26709590

AN - SCOPUS:84952802314

VL - 145

SP - 82

EP - 94

JO - Journal of Experimental Psychology: General

JF - Journal of Experimental Psychology: General

SN - 0096-3445

IS - 1

ER -