Recognizing activities in multiple views with fusion of frame judgments

Selen Pehlivan, David A. Forsyth

Research output: Contribution to journalArticle

Abstract

This paper focuses on activity recognition when multiple views are available. In the literature, this is often performed using two different approaches. In the first one, the systems build a 3D reconstruction and match that. However, there are practical disadvantages to this methodology since a sufficient number of overlapping views is needed to reconstruct, and one must calibrate the cameras. A simpler alternative is to match the frames individually. This offers significant advantages in the system architecture (e.g., it is easy to incorporate new features and camera dropouts can be tolerated). In this paper, the second approach is employed and a novel fusion method is proposed. Our fusion method collects the activity labels over frames and cameras, and then fuses activity judgments as the sequence label. It is shown that there is no performance penalty when a straightforward weighted voting scheme is used. In particular, when there are enough overlapping views to generate a volumetric reconstruction, our recognition performance is comparable with that produced by volumetric reconstructions. However, if the overlapping views are not adequate, the performance degrades fairly gracefully, even in cases where test and training views do not overlap.

Original languageEnglish (US)
Pages (from-to)237-249
Number of pages13
JournalImage and Vision Computing
Volume32
Issue number4
DOIs
StatePublished - Apr 2014

Fingerprint

Fusion reactions
Cameras
Labels
Electric fuses

Keywords

  • Human activity recognition
  • Multiple camera
  • Multiple views
  • Video analysis

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition

Cite this

Recognizing activities in multiple views with fusion of frame judgments. / Pehlivan, Selen; Forsyth, David A.

In: Image and Vision Computing, Vol. 32, No. 4, 04.2014, p. 237-249.

Research output: Contribution to journalArticle

@article{6006fd2b012a46d78cca20e35e152dcc,
title = "Recognizing activities in multiple views with fusion of frame judgments",
abstract = "This paper focuses on activity recognition when multiple views are available. In the literature, this is often performed using two different approaches. In the first one, the systems build a 3D reconstruction and match that. However, there are practical disadvantages to this methodology since a sufficient number of overlapping views is needed to reconstruct, and one must calibrate the cameras. A simpler alternative is to match the frames individually. This offers significant advantages in the system architecture (e.g., it is easy to incorporate new features and camera dropouts can be tolerated). In this paper, the second approach is employed and a novel fusion method is proposed. Our fusion method collects the activity labels over frames and cameras, and then fuses activity judgments as the sequence label. It is shown that there is no performance penalty when a straightforward weighted voting scheme is used. In particular, when there are enough overlapping views to generate a volumetric reconstruction, our recognition performance is comparable with that produced by volumetric reconstructions. However, if the overlapping views are not adequate, the performance degrades fairly gracefully, even in cases where test and training views do not overlap.",
keywords = "Human activity recognition, Multiple camera, Multiple views, Video analysis",
author = "Selen Pehlivan and Forsyth, {David A.}",
year = "2014",
month = "4",
doi = "10.1016/j.imavis.2014.01.006",
language = "English (US)",
volume = "32",
pages = "237--249",
journal = "Image and Vision Computing",
issn = "0262-8856",
publisher = "Elsevier Limited",
number = "4",

}

TY - JOUR

T1 - Recognizing activities in multiple views with fusion of frame judgments

AU - Pehlivan, Selen

AU - Forsyth, David A.

PY - 2014/4

Y1 - 2014/4

N2 - This paper focuses on activity recognition when multiple views are available. In the literature, this is often performed using two different approaches. In the first one, the systems build a 3D reconstruction and match that. However, there are practical disadvantages to this methodology since a sufficient number of overlapping views is needed to reconstruct, and one must calibrate the cameras. A simpler alternative is to match the frames individually. This offers significant advantages in the system architecture (e.g., it is easy to incorporate new features and camera dropouts can be tolerated). In this paper, the second approach is employed and a novel fusion method is proposed. Our fusion method collects the activity labels over frames and cameras, and then fuses activity judgments as the sequence label. It is shown that there is no performance penalty when a straightforward weighted voting scheme is used. In particular, when there are enough overlapping views to generate a volumetric reconstruction, our recognition performance is comparable with that produced by volumetric reconstructions. However, if the overlapping views are not adequate, the performance degrades fairly gracefully, even in cases where test and training views do not overlap.

AB - This paper focuses on activity recognition when multiple views are available. In the literature, this is often performed using two different approaches. In the first one, the systems build a 3D reconstruction and match that. However, there are practical disadvantages to this methodology since a sufficient number of overlapping views is needed to reconstruct, and one must calibrate the cameras. A simpler alternative is to match the frames individually. This offers significant advantages in the system architecture (e.g., it is easy to incorporate new features and camera dropouts can be tolerated). In this paper, the second approach is employed and a novel fusion method is proposed. Our fusion method collects the activity labels over frames and cameras, and then fuses activity judgments as the sequence label. It is shown that there is no performance penalty when a straightforward weighted voting scheme is used. In particular, when there are enough overlapping views to generate a volumetric reconstruction, our recognition performance is comparable with that produced by volumetric reconstructions. However, if the overlapping views are not adequate, the performance degrades fairly gracefully, even in cases where test and training views do not overlap.

KW - Human activity recognition

KW - Multiple camera

KW - Multiple views

KW - Video analysis

UR - http://www.scopus.com/inward/record.url?scp=84897762052&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897762052&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2014.01.006

DO - 10.1016/j.imavis.2014.01.006

M3 - Article

AN - SCOPUS:84897762052

VL - 32

SP - 237

EP - 249

JO - Image and Vision Computing

JF - Image and Vision Computing

SN - 0262-8856

IS - 4

ER -