An empirical investigation of efficient spatio-temporal modeling in video restoration

Yuchen Fan, Jiahui Yu, DIng Liu, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a comprehensive empirical investigation of efficient spatio-temporal modeling in video restoration tasks. To achieve a better speed-accuracy trade-off, our investigation covers the intersection of three dimensions in deep video restoration networks: spatial-wise, channelwise and temporal-wise. We enumerate various network architectures ranging from 2D convolutional models to their 3D extensions, and discuss their gain and loss in terms of training time, model size, boundary effects, prediction accuracy and the visual quality of restored videos. Under a strictly controlled computational budget, we also specifically explore the design inside each residual building block in a video restoration network, which consists a mixture of 2D and 3D convolutional layers. Our findings are summarized as follows: (1) In 3D convolutional models, setting more computation/channels for spatial convolution leads to better performance than on temporal convolution. (2) The best variant of 3D convolutional models is better than 2D convolutional models, but the performance gap is close. (3) In a very limited range, the performance can be improved by the increase of temporal window size (5 frames for 2D model) or padding size (6 frames for 3D model). Based on these findings, we propose the wide-activated 3D convolutional network for video restoration (WDVR), which achieves state-of-the-art restoration accuracy under constrained computational budgets with low runtime latency. Our solution based on WDVR also won 2nd places in three out of four tracks of NTIRE 2019 Challenge for Video Super-Resolution and Deblurring. Code and models are released at https://github.com/ychfan/ wdvr-ntire2019.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
PublisherIEEE Computer Society
Pages2159-2168
Number of pages10
ISBN (Electronic)9781728125060
DOIs
StatePublished - Jun 2019
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 - Long Beach, United States
Duration: Jun 16 2019Jun 20 2019

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume2019-June
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
CountryUnited States
CityLong Beach
Period6/16/196/20/19

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'An empirical investigation of efficient spatio-temporal modeling in video restoration'. Together they form a unique fingerprint.

  • Cite this

    Fan, Y., Yu, J., Liu, DI., & Huang, T. S. (2019). An empirical investigation of efficient spatio-temporal modeling in video restoration. In Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 (pp. 2159-2168). [9025696] (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2019-June). IEEE Computer Society. https://doi.org/10.1109/CVPRW.2019.00269