TY - GEN
T1 - DLbricks
T2 - 11th ACM/SPEC International Conference on Performance Engineering, ICPE 2020
AU - Li, Cheng
AU - Dakkak, Abdul
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
N1 - This work is supported by the IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) - a member of the IBM Cognitive Horizon Network, and the Applications Driving Architectures (ADA) Research Center - one of the JUMP Centers co-sponsored by SRC and DARPA.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - The past few years have seen a surge of applying Deep Learning (DL) models for a wide array of tasks such as image classification, object detection, machine translation, etc. While DL models provide an opportunity to solve otherwise intractable tasks, their adoption relies on them being optimized to meet target latency and resource requirements. Benchmarking is a key step in this process but has been hampered in part due to the lack of representative and up-to-date benchmarking suites. This paper proposes DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model's performance using the performance of the generated benchmarks. Since benchmarks are generated automatically and the benchmarking time is minimized, DLBricks can keep up-to-date with the latest proposed models, relieving the pressure of selecting representative DL models. We evaluate DLBricks using 50 MXNet models spanning 5 DL tasks on 4 representative CPU systems. We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e.g. within 95% accuracy and up to 4.4× benchmarking time speedup on Amazon EC2 c5.xlarge).
AB - The past few years have seen a surge of applying Deep Learning (DL) models for a wide array of tasks such as image classification, object detection, machine translation, etc. While DL models provide an opportunity to solve otherwise intractable tasks, their adoption relies on them being optimized to meet target latency and resource requirements. Benchmarking is a key step in this process but has been hampered in part due to the lack of representative and up-to-date benchmarking suites. This paper proposes DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model's performance using the performance of the generated benchmarks. Since benchmarks are generated automatically and the benchmarking time is minimized, DLBricks can keep up-to-date with the latest proposed models, relieving the pressure of selecting representative DL models. We evaluate DLBricks using 50 MXNet models spanning 5 DL tasks on 4 representative CPU systems. We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e.g. within 95% accuracy and up to 4.4× benchmarking time speedup on Amazon EC2 c5.xlarge).
KW - Benchmarking
KW - Deep learning
KW - Performance measurement
UR - https://www.scopus.com/pages/publications/85085943026
UR - https://www.scopus.com/pages/publications/85085943026#tab=citedBy
U2 - 10.1145/3358960.3379143
DO - 10.1145/3358960.3379143
M3 - Conference contribution
AN - SCOPUS:85085943026
T3 - ICPE 2020 - Proceedings of the ACM/SPEC International Conference on Performance Engineering
SP - 202
EP - 209
BT - ICPE 2020 - Proceedings of the ACM/SPEC International Conference on Performance Engineering
PB - Association for Computing Machinery
Y2 - 20 April 2020 through 24 April 2020
ER -