Abstract
Deep Neural Networks (DNNs) have demonstrated tremendous success in many applications, but incur high computational burden on the inference side. The 2:4 sparsity pruning method has recently been developed to effectively compress and accelerate DNNs with little to no loss in performance. The method comprises a training phase followed by a pruning step where 2 out of 4 consecutive weights are eliminated to obtain a pruned matrix, which is then retrained to fine-tune the remaining weights. The accuracy of the resultant sparse network is maximized by permuting the matrix along the channel dimension in a way that maximizes the total magnitude of weights preserved during pruning. While earlier works have proposed heuristic methods to generate good permutations, we formalized the problem as a discrete optimization problem. In this paper, we propose four different mathematical programs to determine the optimal permutations and compare their performance for small-sized instances using a standard solver. Further, we develop a complementary column generation scheme to solve DNNs with realistic number of channels.
Original language | English (US) |
---|---|
Pages (from-to) | 2079-2090 |
Number of pages | 12 |
Journal | Optimization Letters |
Volume | 18 |
Issue number | 9 |
DOIs | |
State | Published - Dec 2024 |
Externally published | Yes |
Keywords
- Channel permutations
- Column generation
- Mathematical programming
- N:M fine grained structured sparsity
ASJC Scopus subject areas
- Business, Management and Accounting (miscellaneous)
- Control and Optimization