Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms

Yuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cloud platforms today have been deploying hardware accelerators like neural processing units (NPUs) for powering machine learning (ML) inference services. To maximize the resource utilization while ensuring reasonable quality of service, a natural approach is to virtualize NPUs for efficient resource sharing for multi-Tenant ML services. However, virtualizing NPUs for modern cloud platforms is not easy. This is not only due to the lack of system abstraction support for NPU hardware, but also due to the lack of architectural and ISA support for enabling fine-grained dynamic operator scheduling for virtualized NPUs. We present Neu10, a holistic NPU virtualization framework. We investigate virtualization techniques for NPUs across the entire software and hardware stack. Neul0 consists of (1) a flexible NPU abstraction called vNPU, which enables fine-grained virtualization of the heterogeneous compute units in a physical NPU (pNPU); (2) a vNPU resource allocator that enables pay-As-you-go computing model and flexible vNPU-To-pNPU mappings for improved resource utilization and cost-effectiveness; (3) an ISA extension of modern NPU architecture for facilitating fine-grained tensor operator scheduling for multiple vNPUs. We implement Neu10 based on a production-level NPU simulator. Our experiments show that Neul0 improves the throughput of ML inference services by up to 1.4 × and reduces the tail latency by up to 4.6 ×, while improving the NPU utilization by 1.2 × on average, compared to state-of-The-Art NPU sharing approaches.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
PublisherIEEE Computer Society
Pages1-16
Number of pages16
ISBN (Electronic)9798350350579
DOIs
StatePublished - 2024
Event57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024 - Austin, United States
Duration: Nov 2 2024Nov 6 2024

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451

Conference

Conference57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
Country/TerritoryUnited States
CityAustin
Period11/2/2411/6/24

Keywords

  • machine learning accelerator
  • neural processing unit
  • virtualization

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms'. Together they form a unique fingerprint.

Cite this