Exploiting nonlocal spatiotemporal structure for video segmentation

Hsien Ting Cheng, Narendra Ahuja

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unsupervised video segmentation is a challenging problem because it involves a large amount of data, and image segments undergo noisy variations in color, texture and motion with time. However, there are significant redundancies that can help disambiguate the effects of noise. To exploit these redundancies and obtain the most spatio-temporally consistent video segmentation, we formulate the problem as a consistent labeling problem by exploiting higher order image structure. A label stands for a specific moving segment. Each segment (or region) is treated as a random variable which is to be assigned a label. Regions assigned the same label comprise a 3D space-time segment, or a region tube. The labels can also be automatically created or terminated at any frame in the video sequence, to allow objects entering or leaving the scene. To formulate this problem, we use the CRF (conditional random field) model. Unlike conventional CRF which has only unary and binary potentials, we also use higher order potentials to favor label consistency among disconnected spatial and temporal segments. Compared to region tracking based methods, the main advantages of the proposed algorithm are two fold: (1) the label consistency constraints are imposed on multiple regions but in a soft manner, and (2) the labeling decision is postponed until the confidence in the labeling is high. We compare our results with a recent state-of-the-art video segmentation algorithm and show that our results are quantitatively and qualitatively better.

Original languageEnglish (US)
Title of host publication2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
Pages741-748
Number of pages8
DOIs
StatePublished - Oct 1 2012
Event2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012 - Providence, RI, United States
Duration: Jun 16 2012Jun 21 2012

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Other

Other2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
CountryUnited States
CityProvidence, RI
Period6/16/126/21/12

Fingerprint

Labels
Labeling
Redundancy
Random variables
Textures
Color

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Cite this

Cheng, H. T., & Ahuja, N. (2012). Exploiting nonlocal spatiotemporal structure for video segmentation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012 (pp. 741-748). [6247744] (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). https://doi.org/10.1109/CVPR.2012.6247744

Exploiting nonlocal spatiotemporal structure for video segmentation. / Cheng, Hsien Ting; Ahuja, Narendra.

2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012. 2012. p. 741-748 6247744 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cheng, HT & Ahuja, N 2012, Exploiting nonlocal spatiotemporal structure for video segmentation. in 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012., 6247744, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 741-748, 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, Providence, RI, United States, 6/16/12. https://doi.org/10.1109/CVPR.2012.6247744
Cheng HT, Ahuja N. Exploiting nonlocal spatiotemporal structure for video segmentation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012. 2012. p. 741-748. 6247744. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). https://doi.org/10.1109/CVPR.2012.6247744
Cheng, Hsien Ting ; Ahuja, Narendra. / Exploiting nonlocal spatiotemporal structure for video segmentation. 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012. 2012. pp. 741-748 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).
@inproceedings{2046926092fd4f6ab59cd881639cb79c,
title = "Exploiting nonlocal spatiotemporal structure for video segmentation",
abstract = "Unsupervised video segmentation is a challenging problem because it involves a large amount of data, and image segments undergo noisy variations in color, texture and motion with time. However, there are significant redundancies that can help disambiguate the effects of noise. To exploit these redundancies and obtain the most spatio-temporally consistent video segmentation, we formulate the problem as a consistent labeling problem by exploiting higher order image structure. A label stands for a specific moving segment. Each segment (or region) is treated as a random variable which is to be assigned a label. Regions assigned the same label comprise a 3D space-time segment, or a region tube. The labels can also be automatically created or terminated at any frame in the video sequence, to allow objects entering or leaving the scene. To formulate this problem, we use the CRF (conditional random field) model. Unlike conventional CRF which has only unary and binary potentials, we also use higher order potentials to favor label consistency among disconnected spatial and temporal segments. Compared to region tracking based methods, the main advantages of the proposed algorithm are two fold: (1) the label consistency constraints are imposed on multiple regions but in a soft manner, and (2) the labeling decision is postponed until the confidence in the labeling is high. We compare our results with a recent state-of-the-art video segmentation algorithm and show that our results are quantitatively and qualitatively better.",
author = "Cheng, {Hsien Ting} and Narendra Ahuja",
year = "2012",
month = "10",
day = "1",
doi = "10.1109/CVPR.2012.6247744",
language = "English (US)",
isbn = "9781467312264",
series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",
pages = "741--748",
booktitle = "2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012",

}

TY - GEN

T1 - Exploiting nonlocal spatiotemporal structure for video segmentation

AU - Cheng, Hsien Ting

AU - Ahuja, Narendra

PY - 2012/10/1

Y1 - 2012/10/1

N2 - Unsupervised video segmentation is a challenging problem because it involves a large amount of data, and image segments undergo noisy variations in color, texture and motion with time. However, there are significant redundancies that can help disambiguate the effects of noise. To exploit these redundancies and obtain the most spatio-temporally consistent video segmentation, we formulate the problem as a consistent labeling problem by exploiting higher order image structure. A label stands for a specific moving segment. Each segment (or region) is treated as a random variable which is to be assigned a label. Regions assigned the same label comprise a 3D space-time segment, or a region tube. The labels can also be automatically created or terminated at any frame in the video sequence, to allow objects entering or leaving the scene. To formulate this problem, we use the CRF (conditional random field) model. Unlike conventional CRF which has only unary and binary potentials, we also use higher order potentials to favor label consistency among disconnected spatial and temporal segments. Compared to region tracking based methods, the main advantages of the proposed algorithm are two fold: (1) the label consistency constraints are imposed on multiple regions but in a soft manner, and (2) the labeling decision is postponed until the confidence in the labeling is high. We compare our results with a recent state-of-the-art video segmentation algorithm and show that our results are quantitatively and qualitatively better.

AB - Unsupervised video segmentation is a challenging problem because it involves a large amount of data, and image segments undergo noisy variations in color, texture and motion with time. However, there are significant redundancies that can help disambiguate the effects of noise. To exploit these redundancies and obtain the most spatio-temporally consistent video segmentation, we formulate the problem as a consistent labeling problem by exploiting higher order image structure. A label stands for a specific moving segment. Each segment (or region) is treated as a random variable which is to be assigned a label. Regions assigned the same label comprise a 3D space-time segment, or a region tube. The labels can also be automatically created or terminated at any frame in the video sequence, to allow objects entering or leaving the scene. To formulate this problem, we use the CRF (conditional random field) model. Unlike conventional CRF which has only unary and binary potentials, we also use higher order potentials to favor label consistency among disconnected spatial and temporal segments. Compared to region tracking based methods, the main advantages of the proposed algorithm are two fold: (1) the label consistency constraints are imposed on multiple regions but in a soft manner, and (2) the labeling decision is postponed until the confidence in the labeling is high. We compare our results with a recent state-of-the-art video segmentation algorithm and show that our results are quantitatively and qualitatively better.

UR - http://www.scopus.com/inward/record.url?scp=84866716682&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866716682&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2012.6247744

DO - 10.1109/CVPR.2012.6247744

M3 - Conference contribution

AN - SCOPUS:84866716682

SN - 9781467312264

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 741

EP - 748

BT - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012

ER -