TY - JOUR
T1 - Determining optimal channel partition for 2:4 fine grained structured sparsity
AU - Mahajan, Mohit
AU - Hwu, Wen Mei
AU - Nagi, Rakesh
N1 - We acknowledge the immense contributions of Dr. Jeff Pool, NVIDIA, who introduced us to this problem and supported the numerical testing by sharing the APEX repository and his expertise. Rakesh Nagi also acknowledges NVIDIA for an equipment gift under the Applied Accelerator Program.
PY - 2024/12
Y1 - 2024/12
N2 - Deep Neural Networks (DNNs) have demonstrated tremendous success in many applications, but incur high computational burden on the inference side. The 2:4 sparsity pruning method has recently been developed to effectively compress and accelerate DNNs with little to no loss in performance. The method comprises a training phase followed by a pruning step where 2 out of 4 consecutive weights are eliminated to obtain a pruned matrix, which is then retrained to fine-tune the remaining weights. The accuracy of the resultant sparse network is maximized by permuting the matrix along the channel dimension in a way that maximizes the total magnitude of weights preserved during pruning. While earlier works have proposed heuristic methods to generate good permutations, we formalized the problem as a discrete optimization problem. In this paper, we propose four different mathematical programs to determine the optimal permutations and compare their performance for small-sized instances using a standard solver. Further, we develop a complementary column generation scheme to solve DNNs with realistic number of channels.
AB - Deep Neural Networks (DNNs) have demonstrated tremendous success in many applications, but incur high computational burden on the inference side. The 2:4 sparsity pruning method has recently been developed to effectively compress and accelerate DNNs with little to no loss in performance. The method comprises a training phase followed by a pruning step where 2 out of 4 consecutive weights are eliminated to obtain a pruned matrix, which is then retrained to fine-tune the remaining weights. The accuracy of the resultant sparse network is maximized by permuting the matrix along the channel dimension in a way that maximizes the total magnitude of weights preserved during pruning. While earlier works have proposed heuristic methods to generate good permutations, we formalized the problem as a discrete optimization problem. In this paper, we propose four different mathematical programs to determine the optimal permutations and compare their performance for small-sized instances using a standard solver. Further, we develop a complementary column generation scheme to solve DNNs with realistic number of channels.
KW - Channel permutations
KW - Column generation
KW - Mathematical programming
KW - N:M fine grained structured sparsity
UR - http://www.scopus.com/inward/record.url?scp=85182213022&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182213022&partnerID=8YFLogxK
U2 - 10.1007/s11590-023-02084-8
DO - 10.1007/s11590-023-02084-8
M3 - Article
AN - SCOPUS:85182213022
SN - 1862-4472
VL - 18
SP - 2079
EP - 2090
JO - Optimization Letters
JF - Optimization Letters
IS - 9
ER -