Model compression for edge computing

Shuochao Yao, Tarek Abdelzaher

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We are now ready to address what's arguably the primary inference challenge in Edge AI applications, namely, compressing an AI model to fit the limited resources of edge devices. Many traditional AI models are designed for large-scale cloud environments with ample GPUs. The computational environment at the edge is substantially different. Specifically, it is much more resource-constrained. Fortunately, often edge applications are also more restricted to a data sub-domain. For example, a vision application used for edge security (e.g., detecting intruders) might only need to recognize a relatively small number of object categories compared to a full-fledged general-purpose vision agent. Can model compression be driven by inference quality needs of only the subset of relevant data categories? Such an approach was used in the design of DeepIoT, an application-aware compression framework for neural network architecture. The chapter discusses DeepIoT and its extensions. It shows that dramatic improvements in resource footprint are possible that take advantage of the target data domain without sacrificing inference quality in that domain.

Original languageEnglish (US)
Title of host publicationArtificial Intelligence for Edge Computing
PublisherSpringer
Pages153-195
Number of pages43
ISBN (Electronic)9783031407871
ISBN (Print)9783031407864
DOIs
StatePublished - Dec 21 2023

ASJC Scopus subject areas

  • General Computer Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Model compression for edge computing'. Together they form a unique fingerprint.

Cite this