Determining optimal channel partition for 2:4 fine grained structured sparsity

Mohit Mahajan, Wen Mei Hwu, Rakesh Nagi

Research output: Contribution to journalArticlepeer-review

Abstract

Deep Neural Networks (DNNs) have demonstrated tremendous success in many applications, but incur high computational burden on the inference side. The 2:4 sparsity pruning method has recently been developed to effectively compress and accelerate DNNs with little to no loss in performance. The method comprises a training phase followed by a pruning step where 2 out of 4 consecutive weights are eliminated to obtain a pruned matrix, which is then retrained to fine-tune the remaining weights. The accuracy of the resultant sparse network is maximized by permuting the matrix along the channel dimension in a way that maximizes the total magnitude of weights preserved during pruning. While earlier works have proposed heuristic methods to generate good permutations, we formalized the problem as a discrete optimization problem. In this paper, we propose four different mathematical programs to determine the optimal permutations and compare their performance for small-sized instances using a standard solver. Further, we develop a complementary column generation scheme to solve DNNs with realistic number of channels.

Original languageEnglish (US)
Pages (from-to)2079-2090
Number of pages12
JournalOptimization Letters
Volume18
Issue number9
DOIs
StatePublished - Dec 2024
Externally publishedYes

Keywords

  • Channel permutations
  • Column generation
  • Mathematical programming
  • N:M fine grained structured sparsity

ASJC Scopus subject areas

  • Business, Management and Accounting (miscellaneous)
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Determining optimal channel partition for 2:4 fine grained structured sparsity'. Together they form a unique fingerprint.

Cite this