While computer vision tasks target increasingly challenging scenarios, the need for real-time processing of images rises as well, requiring more efficient methods to accelerate convolutional neural networks. For unit stride convolutions, we use FFT-based methods and Winograd algorithms to compute matrix convolutions, which effectively lower the computing complexity by reducing the number of multiplications. For non-unit stride convolutions, we usually cannot directly apply those algorithms to accelerate the computations. In this work, we propose a novel universal approach to construct the non-unit stride convolution algorithms for any given stride and filter sizes from Winograd algorithms. Specifically, we first demonstrate the steps to decompose an arbitrary convolutional kernel and apply the Winograd algorithms separately to compute non-unit stride convolutions.We then present the derivation of this method and proof by construction to confirm the validity of this approach. Finally, we discuss the minimum number of multiplications and additions necessary for the non-unit stride convolutions and evaluate the performance of the decomposed Winograd algorithms. From our analysis of the computational complexity, the new approach can benefit from 1.5x to 3x fewer multiplications. In our experiments in real DNN layers, we have acquired around 1.3x speedup (Told / Tnew) of the Winograd algorithms against the conventional convolution algorithm in various experiment settings.