Abstract
We are now ready to address what's arguably the primary inference challenge in Edge AI applications, namely, compressing an AI model to fit the limited resources of edge devices. Many traditional AI models are designed for large-scale cloud environments with ample GPUs. The computational environment at the edge is substantially different. Specifically, it is much more resource-constrained. Fortunately, often edge applications are also more restricted to a data sub-domain. For example, a vision application used for edge security (e.g., detecting intruders) might only need to recognize a relatively small number of object categories compared to a full-fledged general-purpose vision agent. Can model compression be driven by inference quality needs of only the subset of relevant data categories? Such an approach was used in the design of DeepIoT, an application-aware compression framework for neural network architecture. The chapter discusses DeepIoT and its extensions. It shows that dramatic improvements in resource footprint are possible that take advantage of the target data domain without sacrificing inference quality in that domain.
Original language | English (US) |
---|---|
Title of host publication | Artificial Intelligence for Edge Computing |
Publisher | Springer |
Pages | 153-195 |
Number of pages | 43 |
ISBN (Electronic) | 9783031407871 |
ISBN (Print) | 9783031407864 |
DOIs | |
State | Published - Dec 21 2023 |
ASJC Scopus subject areas
- General Computer Science
- General Engineering