TY - GEN
T1 - A Performance Prediction-based DNN Partitioner for Edge TPU Pipelining
AU - Zou, Bohua
AU - Sun, Binqi
AU - Hu, Yigong
AU - Klod, Tomasz
AU - Caccamo, Marco
AU - Abdelzaher, Tarek
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Intelligent IoT applications deployed in adversarial environments often operate without reliable cloud connections, requiring local execution of AI pipelines on resource-constrained edge devices. Edge Tensor Processing Unit (TPU) is a specialized AI hardware accelerator known for its low power consumption and high computational efficiency. To optimize DNN performance across multiple Edge TPUs, DNN models are often pipelined by partitioning them into segments. However, uneven workload distribution across these segments can lead to latency bottlenecks, reducing overall throughput, and increasing memory access due to the limited on-chip memory. This issue is especially concerning in mission-critical applications, where minimizing memory contention and ensuring robust performance are critical. To overcome these challenges, we develop a novel performance prediction-based partitioning tool for DNN models on Edge TPU pipelines. This tool uses a Transformer-based model to accurately predict the inference time of individual DNN segments, enabling more efficient partitioning. We introduce two methods: one relying solely on the prediction model and another combining prediction with profiling. Tested on 120 models from the NASBench-101 dataset, both methods significantly improved partitioning robustness and efficiency, reducing solving time by up to 98.86% and 97.21%, respectively, compared to traditional profiling-based approaches, while maintaining comparable bottleneck latencies.
AB - Intelligent IoT applications deployed in adversarial environments often operate without reliable cloud connections, requiring local execution of AI pipelines on resource-constrained edge devices. Edge Tensor Processing Unit (TPU) is a specialized AI hardware accelerator known for its low power consumption and high computational efficiency. To optimize DNN performance across multiple Edge TPUs, DNN models are often pipelined by partitioning them into segments. However, uneven workload distribution across these segments can lead to latency bottlenecks, reducing overall throughput, and increasing memory access due to the limited on-chip memory. This issue is especially concerning in mission-critical applications, where minimizing memory contention and ensuring robust performance are critical. To overcome these challenges, we develop a novel performance prediction-based partitioning tool for DNN models on Edge TPU pipelines. This tool uses a Transformer-based model to accurately predict the inference time of individual DNN segments, enabling more efficient partitioning. We introduce two methods: one relying solely on the prediction model and another combining prediction with profiling. Tested on 120 models from the NASBench-101 dataset, both methods significantly improved partitioning robustness and efficiency, reducing solving time by up to 98.86% and 97.21%, respectively, compared to traditional profiling-based approaches, while maintaining comparable bottleneck latencies.
KW - DNN Partition
KW - Edge TPU
KW - Pipelining
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85214575278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85214575278&partnerID=8YFLogxK
U2 - 10.1109/MILCOM61039.2024.10773756
DO - 10.1109/MILCOM61039.2024.10773756
M3 - Conference contribution
AN - SCOPUS:85214575278
T3 - Proceedings - IEEE Military Communications Conference MILCOM
BT - 2024 IEEE Military Communications Conference, MILCOM 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE Military Communications Conference, MILCOM 2024
Y2 - 28 October 2024 through 1 November 2024
ER -