Technical perspective: Finding people in depth

Research output: Contribution to journalShort surveypeer-review

Abstract

A landmark computer vision system that takes a single depth image containing a person and automatically estimates the pose of the person's body in 3D is described by Shannon and co-researchers. This novel method for pose estimation is the key to the Kinect's success. This tracking by detection approach has the potential for greater robustness because errors made over time are less likely to accumulate. It is enabled by an extremely efficient and reliable solution to the pose estimation problem. Shannon and co-researchers employ data-driven learning to address the tremendous variability in pose and appearance. Motion capture data was used to characterize the space of possible poses, actors performed gestures used in gaming and their joint angles were measured, resulting in a dataset of 100,000 poses. Given a single pose, a simulated depth image can be produced by transferring the pose to a character model and rendering the clothing and hair. The final idea is the use of discriminative part models to represent the body pose.

Original languageEnglish (US)
Pages (from-to)115
Number of pages1
JournalCommunications of the ACM
Volume56
Issue number1
DOIs
StatePublished - Jan 2013
Externally publishedYes

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Technical perspective: Finding people in depth'. Together they form a unique fingerprint.

Cite this