Computational studies of human motion: Part 1, tracking and motion synthesis

David A. Forsyth, Okan Arikan, Leslie Ikemoto, James O'Brien, Deva Ramanan

Research output: Contribution to journalArticle

Abstract

We review methods for kinematic tracking of the human body in video. The review is part of a projected book that is intended to cross-fertilize ideas about motion representation between the animation and computer vision communities. The review confines itself to the earlier stages of motion, focusing on tracking and motion synthesis; future material will cover activity representation and motion generation. In general, we take the position that tracking does not necessarily involve (as is usually thought) complex multimodal inference problems. Instead, there are two key problems, both easy to state. The first is lifting, where one must infer the configuration of the body in three dimensions from image data. Ambiguities in lifting can result in multimodal inference problem, and we review what little is known about the extent to which a lift is ambiguous. The second is data association, where one must determine which pixels in an image come from the body. We see a tracking by detection approach as the most productive, and review various human detection methods. Lifting, and a variety of other problems, can be simplified by observing temporal structure in motion, and we review the literature on data-driven human animation to expose what is known about this structure. Accurate generative models of human motion would be extremely useful in both animation and tracking, and we discuss the profound difficulties encountered in building such models. Discriminative methods - which should be able to tell whether an observed motion is human or not -do not work well yet, and we discuss why. There is an extensive discussion of open issues. In particular, we discuss the nature and extent of lifting ambiguities, which appear to be significant at short timescales and insignificant at longer timescales. This discussion suggests that the best tracking strategy is to track a 2D representation, and then lift it. We point out some puzzling phenomena associated with the choice of human motion representation - joint angles vs. joint positions. Finally, we give a quick guide to resources.

Original languageEnglish (US)
Pages (from-to)77-254
Number of pages178
JournalFoundations and Trends in Computer Graphics and Vision
Volume1
Issue number2-3
DOIs
StatePublished - Dec 1 2006

Fingerprint

Animation
Computer vision
Kinematics
Pixels
Association reactions

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Computational studies of human motion : Part 1, tracking and motion synthesis. / Forsyth, David A.; Arikan, Okan; Ikemoto, Leslie; O'Brien, James; Ramanan, Deva.

In: Foundations and Trends in Computer Graphics and Vision, Vol. 1, No. 2-3, 01.12.2006, p. 77-254.

Research output: Contribution to journalArticle

Forsyth, David A. ; Arikan, Okan ; Ikemoto, Leslie ; O'Brien, James ; Ramanan, Deva. / Computational studies of human motion : Part 1, tracking and motion synthesis. In: Foundations and Trends in Computer Graphics and Vision. 2006 ; Vol. 1, No. 2-3. pp. 77-254.
@article{97e89008b7b646e1b2694c512a484f24,
title = "Computational studies of human motion: Part 1, tracking and motion synthesis",
abstract = "We review methods for kinematic tracking of the human body in video. The review is part of a projected book that is intended to cross-fertilize ideas about motion representation between the animation and computer vision communities. The review confines itself to the earlier stages of motion, focusing on tracking and motion synthesis; future material will cover activity representation and motion generation. In general, we take the position that tracking does not necessarily involve (as is usually thought) complex multimodal inference problems. Instead, there are two key problems, both easy to state. The first is lifting, where one must infer the configuration of the body in three dimensions from image data. Ambiguities in lifting can result in multimodal inference problem, and we review what little is known about the extent to which a lift is ambiguous. The second is data association, where one must determine which pixels in an image come from the body. We see a tracking by detection approach as the most productive, and review various human detection methods. Lifting, and a variety of other problems, can be simplified by observing temporal structure in motion, and we review the literature on data-driven human animation to expose what is known about this structure. Accurate generative models of human motion would be extremely useful in both animation and tracking, and we discuss the profound difficulties encountered in building such models. Discriminative methods - which should be able to tell whether an observed motion is human or not -do not work well yet, and we discuss why. There is an extensive discussion of open issues. In particular, we discuss the nature and extent of lifting ambiguities, which appear to be significant at short timescales and insignificant at longer timescales. This discussion suggests that the best tracking strategy is to track a 2D representation, and then lift it. We point out some puzzling phenomena associated with the choice of human motion representation - joint angles vs. joint positions. Finally, we give a quick guide to resources.",
author = "Forsyth, {David A.} and Okan Arikan and Leslie Ikemoto and James O'Brien and Deva Ramanan",
year = "2006",
month = "12",
day = "1",
doi = "10.1561/0600000005",
language = "English (US)",
volume = "1",
pages = "77--254",
journal = "Foundations and Trends in Computer Graphics and Vision",
issn = "1572-2740",
publisher = "Now Publishers Inc",
number = "2-3",

}

TY - JOUR

T1 - Computational studies of human motion

T2 - Part 1, tracking and motion synthesis

AU - Forsyth, David A.

AU - Arikan, Okan

AU - Ikemoto, Leslie

AU - O'Brien, James

AU - Ramanan, Deva

PY - 2006/12/1

Y1 - 2006/12/1

N2 - We review methods for kinematic tracking of the human body in video. The review is part of a projected book that is intended to cross-fertilize ideas about motion representation between the animation and computer vision communities. The review confines itself to the earlier stages of motion, focusing on tracking and motion synthesis; future material will cover activity representation and motion generation. In general, we take the position that tracking does not necessarily involve (as is usually thought) complex multimodal inference problems. Instead, there are two key problems, both easy to state. The first is lifting, where one must infer the configuration of the body in three dimensions from image data. Ambiguities in lifting can result in multimodal inference problem, and we review what little is known about the extent to which a lift is ambiguous. The second is data association, where one must determine which pixels in an image come from the body. We see a tracking by detection approach as the most productive, and review various human detection methods. Lifting, and a variety of other problems, can be simplified by observing temporal structure in motion, and we review the literature on data-driven human animation to expose what is known about this structure. Accurate generative models of human motion would be extremely useful in both animation and tracking, and we discuss the profound difficulties encountered in building such models. Discriminative methods - which should be able to tell whether an observed motion is human or not -do not work well yet, and we discuss why. There is an extensive discussion of open issues. In particular, we discuss the nature and extent of lifting ambiguities, which appear to be significant at short timescales and insignificant at longer timescales. This discussion suggests that the best tracking strategy is to track a 2D representation, and then lift it. We point out some puzzling phenomena associated with the choice of human motion representation - joint angles vs. joint positions. Finally, we give a quick guide to resources.

AB - We review methods for kinematic tracking of the human body in video. The review is part of a projected book that is intended to cross-fertilize ideas about motion representation between the animation and computer vision communities. The review confines itself to the earlier stages of motion, focusing on tracking and motion synthesis; future material will cover activity representation and motion generation. In general, we take the position that tracking does not necessarily involve (as is usually thought) complex multimodal inference problems. Instead, there are two key problems, both easy to state. The first is lifting, where one must infer the configuration of the body in three dimensions from image data. Ambiguities in lifting can result in multimodal inference problem, and we review what little is known about the extent to which a lift is ambiguous. The second is data association, where one must determine which pixels in an image come from the body. We see a tracking by detection approach as the most productive, and review various human detection methods. Lifting, and a variety of other problems, can be simplified by observing temporal structure in motion, and we review the literature on data-driven human animation to expose what is known about this structure. Accurate generative models of human motion would be extremely useful in both animation and tracking, and we discuss the profound difficulties encountered in building such models. Discriminative methods - which should be able to tell whether an observed motion is human or not -do not work well yet, and we discuss why. There is an extensive discussion of open issues. In particular, we discuss the nature and extent of lifting ambiguities, which appear to be significant at short timescales and insignificant at longer timescales. This discussion suggests that the best tracking strategy is to track a 2D representation, and then lift it. We point out some puzzling phenomena associated with the choice of human motion representation - joint angles vs. joint positions. Finally, we give a quick guide to resources.

UR - http://www.scopus.com/inward/record.url?scp=33846225279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846225279&partnerID=8YFLogxK

U2 - 10.1561/0600000005

DO - 10.1561/0600000005

M3 - Article

AN - SCOPUS:33846225279

VL - 1

SP - 77

EP - 254

JO - Foundations and Trends in Computer Graphics and Vision

JF - Foundations and Trends in Computer Graphics and Vision

SN - 1572-2740

IS - 2-3

ER -