We present a comprehensive empirical investigation of efficient spatio-temporal modeling in video restoration tasks. To achieve a better speed-accuracy trade-off, our investigation covers the intersection of three dimensions in deep video restoration networks: spatial-wise, channelwise and temporal-wise. We enumerate various network architectures ranging from 2D convolutional models to their 3D extensions, and discuss their gain and loss in terms of training time, model size, boundary effects, prediction accuracy and the visual quality of restored videos. Under a strictly controlled computational budget, we also specifically explore the design inside each residual building block in a video restoration network, which consists a mixture of 2D and 3D convolutional layers. Our findings are summarized as follows: (1) In 3D convolutional models, setting more computation/channels for spatial convolution leads to better performance than on temporal convolution. (2) The best variant of 3D convolutional models is better than 2D convolutional models, but the performance gap is close. (3) In a very limited range, the performance can be improved by the increase of temporal window size (5 frames for 2D model) or padding size (6 frames for 3D model). Based on these findings, we propose the wide-activated 3D convolutional network for video restoration (WDVR), which achieves state-of-the-art restoration accuracy under constrained computational budgets with low runtime latency. Our solution based on WDVR also won 2nd places in three out of four tracks of NTIRE 2019 Challenge for Video Super-Resolution and Deblurring. Code and models are released at https://github.com/ychfan/ wdvr-ntire2019.