TY - GEN
T1 - Accelerate Non-unit Stride Convolutions with Winograd Algorithms
AU - Pan, Junhao
AU - Chen, Deming
N1 - Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/1/18
Y1 - 2021/1/18
N2 - While computer vision tasks target increasingly challenging scenarios, the need for real-time processing of images rises as well, requiring more efficient methods to accelerate convolutional neural networks. For unit stride convolutions, we use FFT-based methods and Winograd algorithms to compute matrix convolutions, which effectively lower the computing complexity by reducing the number of multiplications. For non-unit stride convolutions, we usually cannot directly apply those algorithms to accelerate the computations. In this work, we propose a novel universal approach to construct the non-unit stride convolution algorithms for any given stride and filter sizes from Winograd algorithms. Specifically, we first demonstrate the steps to decompose an arbitrary convolutional kernel and apply the Winograd algorithms separately to compute non-unit stride convolutions.We then present the derivation of this method and proof by construction to confirm the validity of this approach. Finally, we discuss the minimum number of multiplications and additions necessary for the non-unit stride convolutions and evaluate the performance of the decomposed Winograd algorithms. From our analysis of the computational complexity, the new approach can benefit from 1.5x to 3x fewer multiplications. In our experiments in real DNN layers, we have acquired around 1.3x speedup (Told / Tnew) of the Winograd algorithms against the conventional convolution algorithm in various experiment settings.
AB - While computer vision tasks target increasingly challenging scenarios, the need for real-time processing of images rises as well, requiring more efficient methods to accelerate convolutional neural networks. For unit stride convolutions, we use FFT-based methods and Winograd algorithms to compute matrix convolutions, which effectively lower the computing complexity by reducing the number of multiplications. For non-unit stride convolutions, we usually cannot directly apply those algorithms to accelerate the computations. In this work, we propose a novel universal approach to construct the non-unit stride convolution algorithms for any given stride and filter sizes from Winograd algorithms. Specifically, we first demonstrate the steps to decompose an arbitrary convolutional kernel and apply the Winograd algorithms separately to compute non-unit stride convolutions.We then present the derivation of this method and proof by construction to confirm the validity of this approach. Finally, we discuss the minimum number of multiplications and additions necessary for the non-unit stride convolutions and evaluate the performance of the decomposed Winograd algorithms. From our analysis of the computational complexity, the new approach can benefit from 1.5x to 3x fewer multiplications. In our experiments in real DNN layers, we have acquired around 1.3x speedup (Told / Tnew) of the Winograd algorithms against the conventional convolution algorithm in various experiment settings.
UR - http://www.scopus.com/inward/record.url?scp=85100524426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100524426&partnerID=8YFLogxK
U2 - 10.1145/3394885.3431534
DO - 10.1145/3394885.3431534
M3 - Conference contribution
AN - SCOPUS:85100524426
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 358
EP - 364
BT - Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021
Y2 - 18 January 2021 through 21 January 2021
ER -