Stochastic variational video prediction

Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, R H Campbell, Sergey Levine

Research output: Contribution to conferencePaper

Abstract

Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images require the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world videos. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

Original languageEnglish (US)
StatePublished - Jan 1 2018
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018

Conference

Conference6th International Conference on Learning Representations, ICLR 2018
CountryCanada
CityVancouver
Period4/30/185/3/18

Fingerprint

video
predictive model
Real World
Prediction
event

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Computer Science Applications
  • Linguistics and Language

Cite this

Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R. H., & Levine, S. (2018). Stochastic variational video prediction. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.

Stochastic variational video prediction. / Babaeizadeh, Mohammad; Finn, Chelsea; Erhan, Dumitru; Campbell, R H; Levine, Sergey.

2018. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.

Research output: Contribution to conferencePaper

Babaeizadeh, M, Finn, C, Erhan, D, Campbell, RH & Levine, S 2018, 'Stochastic variational video prediction' Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada, 4/30/18 - 5/3/18, .
Babaeizadeh M, Finn C, Erhan D, Campbell RH, Levine S. Stochastic variational video prediction. 2018. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.
Babaeizadeh, Mohammad ; Finn, Chelsea ; Erhan, Dumitru ; Campbell, R H ; Levine, Sergey. / Stochastic variational video prediction. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada.
@conference{db72de96fb8c428ba7f08f4953dc8546,
title = "Stochastic variational video prediction",
abstract = "Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images require the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world videos. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.",
author = "Mohammad Babaeizadeh and Chelsea Finn and Dumitru Erhan and Campbell, {R H} and Sergey Levine",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
note = "6th International Conference on Learning Representations, ICLR 2018 ; Conference date: 30-04-2018 Through 03-05-2018",

}

TY - CONF

T1 - Stochastic variational video prediction

AU - Babaeizadeh, Mohammad

AU - Finn, Chelsea

AU - Erhan, Dumitru

AU - Campbell, R H

AU - Levine, Sergey

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images require the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world videos. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

AB - Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images require the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world videos. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

UR - http://www.scopus.com/inward/record.url?scp=85071170165&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071170165&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85071170165

ER -