Learning a sparse representation for object detection

Shivani Agarwal, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We present an approach for learning to detect objects in still gray images, that is basedon a sparse, part-based representation ofobjects. Avocabulary of information-rich object parts is automatically constructed from a set of sample images of the object class of interest. Images are then represented using parts from this vocabulary, along with spatial relations observed among them. Based on this representation, a feature-efficient learning algorithm is used to learn to detect instances of the object class. The framework developed can be applied to any object with distinguishable parts in a relatively fixed spatial configuration. We report experiments on images of side views of cars. Our experiments show that the method achieves high detection accuracy on a difficult test set of real-world images, and is highly robust to partial occlusion and background variation. In addition, we discuss and offer solutions to several methodological issues that are significant for the research community to be able to evaluate object detection approaches.

Original languageEnglish (US)
Title of host publicationComputer Vision - ECCV 2002 - 7th European Conference on Computer Vision, Proceedings
EditorsAnders Heyden, Gunnar Sparr, Mads Nielsen, Peter Johansen
Number of pages15
ISBN (Electronic)9783540437482
StatePublished - 2002
Externally publishedYes
Event7th European Conference on Computer Vision, ECCV 2002 - Copenhagen, Denmark
Duration: May 28 2002May 31 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other7th European Conference on Computer Vision, ECCV 2002

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Learning a sparse representation for object detection'. Together they form a unique fingerprint.

Cite this