Power-aware Deep Learning Model Serving with µ-Serve

Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the increasing popularity of large deep learning model-serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine-grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, µ-Serve. µ-Serve is a model-serving framework that optimizes the power consumption and model-serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that µ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.

Original languageEnglish (US)
Title of host publicationProceedings of the 2024 USENIX Annual Technical Conference, ATC 2024
PublisherUSENIX Association
Pages75-93
Number of pages19
ISBN (Electronic)9781939133410
StatePublished - 2024
Event2024 USENIX Annual Technical Conference, ATC 2024 - Santa Clara, United States
Duration: Jul 10 2024Jul 12 2024

Publication series

NameProceedings of the 2024 USENIX Annual Technical Conference, ATC 2024

Conference

Conference2024 USENIX Annual Technical Conference, ATC 2024
Country/TerritoryUnited States
CitySanta Clara
Period7/10/247/12/24

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Power-aware Deep Learning Model Serving with µ-Serve'. Together they form a unique fingerprint.

Cite this