TY - GEN
T1 - DeepSZ
T2 - 28th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2019
AU - Jin, Sian
AU - Di, Sheng
AU - Liang, Xin
AU - Tian, Jiannan
AU - Tao, Dingwen
AU - Cappello, Franck
N1 - Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations - the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation's exascale computing imperative. This material was based upon work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357, and also supported by the National Science Foundation under Grant No. 1619253. We gratefully acknowledge the support from Alabama Water Institute (AWI), Remote Sensing Center (RSC), and Center for Complex Hydrosystems Research (CCHR).
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/17
Y1 - 2019/6/17
N2 - Today's deep neural networks (DNNs) are becoming deeper and wider because of increasing demand on the analysis quality and more and more complex applications to resolve. The wide and deep DNNs, however, require large amounts of resources (such as memory, storage, and I/O), significantly restricting their utilization on resource-constrained platforms. Although some DNN simplification methods (such as weight quantization) have been proposed to address this issue, they suffer from either low compression ratios or high compression errors, which may introduce an expensive fine-tuning overhead (i.e., a costly retraining process for the target inference accuracy). In this paper, we propose DeepSZ: an accuracyloss expected neural network compression framework, which involves four key steps: network pruning, error bound assessment, optimization for error bound configuration, and compressed model generation, featuring a high compression ratio and low encoding time. The contribution is threefold. (1)We develop an adaptive approach to select the feasible error bounds for each layer. (2) We build a model to estimate the overall loss of inference accuracy based on the inference accuracy degradation caused by individual decompressed layers. (3) We develop an efficient optimization algorithm to determine the best-fit configuration of error bounds in order to maximize the compression ratio under the user-set inference accuracy constraint. Experiments show that DeepSZ can compress AlexNet and VGG-16 on the ImageNet dataset by a compression ratio of 46× and 116×, respectively, and compress LeNet-300-100 and LeNet-5 on the MNIST dataset by a compression ratio of 57× and 56×, respectively, with only up to 0.3% loss of inference accuracy. Compared with other state-of-the-art methods, DeepSZ can improve the compression ratio by up to 1.43×, the DNN encoding performance by up to 4.0× with four V100 GPUs, and the decoding performance by up to 6.2×.
AB - Today's deep neural networks (DNNs) are becoming deeper and wider because of increasing demand on the analysis quality and more and more complex applications to resolve. The wide and deep DNNs, however, require large amounts of resources (such as memory, storage, and I/O), significantly restricting their utilization on resource-constrained platforms. Although some DNN simplification methods (such as weight quantization) have been proposed to address this issue, they suffer from either low compression ratios or high compression errors, which may introduce an expensive fine-tuning overhead (i.e., a costly retraining process for the target inference accuracy). In this paper, we propose DeepSZ: an accuracyloss expected neural network compression framework, which involves four key steps: network pruning, error bound assessment, optimization for error bound configuration, and compressed model generation, featuring a high compression ratio and low encoding time. The contribution is threefold. (1)We develop an adaptive approach to select the feasible error bounds for each layer. (2) We build a model to estimate the overall loss of inference accuracy based on the inference accuracy degradation caused by individual decompressed layers. (3) We develop an efficient optimization algorithm to determine the best-fit configuration of error bounds in order to maximize the compression ratio under the user-set inference accuracy constraint. Experiments show that DeepSZ can compress AlexNet and VGG-16 on the ImageNet dataset by a compression ratio of 46× and 116×, respectively, and compress LeNet-300-100 and LeNet-5 on the MNIST dataset by a compression ratio of 57× and 56×, respectively, with only up to 0.3% loss of inference accuracy. Compared with other state-of-the-art methods, DeepSZ can improve the compression ratio by up to 1.43×, the DNN encoding performance by up to 4.0× with four V100 GPUs, and the decoding performance by up to 6.2×.
KW - Deep learning
KW - Lossy compression
KW - Neural networks
KW - Performance
UR - http://www.scopus.com/inward/record.url?scp=85069160742&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069160742&partnerID=8YFLogxK
U2 - 10.1145/3307681.3326608
DO - 10.1145/3307681.3326608
M3 - Conference contribution
AN - SCOPUS:85069160742
T3 - HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
SP - 159
EP - 170
BT - HPDC 2019- Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery
Y2 - 22 June 2019 through 29 June 2019
ER -