TY - GEN
T1 - An empirical investigation of efficient spatio-temporal modeling in video restoration
AU - Fan, Yuchen
AU - Yu, Jiahui
AU - Liu, DIng
AU - Huang, Thomas S.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - We present a comprehensive empirical investigation of efficient spatio-temporal modeling in video restoration tasks. To achieve a better speed-accuracy trade-off, our investigation covers the intersection of three dimensions in deep video restoration networks: spatial-wise, channelwise and temporal-wise. We enumerate various network architectures ranging from 2D convolutional models to their 3D extensions, and discuss their gain and loss in terms of training time, model size, boundary effects, prediction accuracy and the visual quality of restored videos. Under a strictly controlled computational budget, we also specifically explore the design inside each residual building block in a video restoration network, which consists a mixture of 2D and 3D convolutional layers. Our findings are summarized as follows: (1) In 3D convolutional models, setting more computation/channels for spatial convolution leads to better performance than on temporal convolution. (2) The best variant of 3D convolutional models is better than 2D convolutional models, but the performance gap is close. (3) In a very limited range, the performance can be improved by the increase of temporal window size (5 frames for 2D model) or padding size (6 frames for 3D model). Based on these findings, we propose the wide-activated 3D convolutional network for video restoration (WDVR), which achieves state-of-the-art restoration accuracy under constrained computational budgets with low runtime latency. Our solution based on WDVR also won 2nd places in three out of four tracks of NTIRE 2019 Challenge for Video Super-Resolution and Deblurring. Code and models are released at https://github.com/ychfan/ wdvr-ntire2019.
AB - We present a comprehensive empirical investigation of efficient spatio-temporal modeling in video restoration tasks. To achieve a better speed-accuracy trade-off, our investigation covers the intersection of three dimensions in deep video restoration networks: spatial-wise, channelwise and temporal-wise. We enumerate various network architectures ranging from 2D convolutional models to their 3D extensions, and discuss their gain and loss in terms of training time, model size, boundary effects, prediction accuracy and the visual quality of restored videos. Under a strictly controlled computational budget, we also specifically explore the design inside each residual building block in a video restoration network, which consists a mixture of 2D and 3D convolutional layers. Our findings are summarized as follows: (1) In 3D convolutional models, setting more computation/channels for spatial convolution leads to better performance than on temporal convolution. (2) The best variant of 3D convolutional models is better than 2D convolutional models, but the performance gap is close. (3) In a very limited range, the performance can be improved by the increase of temporal window size (5 frames for 2D model) or padding size (6 frames for 3D model). Based on these findings, we propose the wide-activated 3D convolutional network for video restoration (WDVR), which achieves state-of-the-art restoration accuracy under constrained computational budgets with low runtime latency. Our solution based on WDVR also won 2nd places in three out of four tracks of NTIRE 2019 Challenge for Video Super-Resolution and Deblurring. Code and models are released at https://github.com/ychfan/ wdvr-ntire2019.
UR - http://www.scopus.com/inward/record.url?scp=85083323350&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083323350&partnerID=8YFLogxK
U2 - 10.1109/CVPRW.2019.00269
DO - 10.1109/CVPRW.2019.00269
M3 - Conference contribution
AN - SCOPUS:85083323350
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 2159
EP - 2168
BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
PB - IEEE Computer Society
T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Y2 - 16 June 2019 through 20 June 2019
ER -