TY - GEN
T1 - XSP
T2 - 34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020
AU - Li, Cheng
AU - Dakkak, Abdul
AU - Xiong, Jinjun
AU - Wei, Wei
AU - Xu, Lingjie
AU - Hwu, Wen Mei
N1 - Funding Information:
This work is supported by the IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) - a member of the IBM Cognitive Horizon Network, and the Applications Driving Architectures (ADA) Research Center - one of the JUMP Centers co-sponsored by SRC and DARPA.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack, which limits the thoroughness and usefulness of the profiling results.This paper proposes XSP-an across-stack profiling design that gives a holistic and hierarchical view of ML model execution. XSP leverages distributed tracing to aggregate and correlate profile data from different sources. XSP introduces a leveled and iterative measurement approach that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead. We couple the profiling design with an automated analysis pipeline to systematically analyze 65 state-of-the-art ML models. We demonstrate that XSP provides insights which would be difficult to discern otherwise.
AB - There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack, which limits the thoroughness and usefulness of the profiling results.This paper proposes XSP-an across-stack profiling design that gives a holistic and hierarchical view of ML model execution. XSP leverages distributed tracing to aggregate and correlate profile data from different sources. XSP introduces a leveled and iterative measurement approach that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead. We couple the profiling design with an automated analysis pipeline to systematically analyze 65 state-of-the-art ML models. We demonstrate that XSP provides insights which would be difficult to discern otherwise.
UR - http://www.scopus.com/inward/record.url?scp=85088901165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088901165&partnerID=8YFLogxK
U2 - 10.1109/IPDPS47924.2020.00042
DO - 10.1109/IPDPS47924.2020.00042
M3 - Conference contribution
AN - SCOPUS:85088901165
T3 - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
SP - 326
EP - 327
BT - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 May 2020 through 22 May 2020
ER -