Abstract
In this paper we describe a novel generative model for video analysis called the transformed hidden Markov model (THMM). The video sequence is modeled as a set of frames generated by transforming a small number of class images that summarize the sequence. For each frame, the transformation and the class are discrete latent variables that depend on the previous class and transformation in the sequence. The set of possible transformations is defined in advance, and it can include a variety of transformation such as translation, rotation and shearing. In each stage of such a Markov model, a new frame is generated from a transformed Gaussian distribution based on the class/transformation combination generated by the Markov chain. This model can be viewed as an extension of a transformed mixture of Gaussians [1] through time. We use this model to cluster unlabeled video segments and form a video summary in an unsupervised fashion. We also use the trained models to perform tracking, image stabilization and filtering. We demonstrate that the THMM is capable of combining long term dependencies in video sequences (repeating similar frames in remote parts of the sequence) with short term dependencies (such as short term image frame similarities and motion patterns) to better summarize and process a video sequence even in the presence of high levels of white or structured noise (such as foreground occlusion).
Original language | English (US) |
---|---|
Pages (from-to) | 26-33 |
Number of pages | 8 |
Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
Volume | 2 |
State | Published - 2000 |
Event | IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2000 - Hilton Head Island, SC, USA Duration: Jun 13 2000 → Jun 15 2000 |
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition