TY - GEN
T1 - Failure-Resilient ML Inference at the Edge through Graceful Service Degradation
AU - Hanafy, Walid A.
AU - Wu, Li
AU - Abdelzaher, Tarek
AU - Diggavi, Suhas
AU - Shenoy, Prashant
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - With recent innovations in machine learning (ML) technologies, especially deep learning, many IoT applications have increasingly relied on ML models for various tasks, such as classification, detection, and decision-making. Most of these tasks are latency-sensitive and depend on models deployed at the edge of the network. Network and edge devices are prone to various kinds of failures, such as transient, crash, or Byzantine failures. Such failures can impact the IoT device's ability to offload tasks, affecting the system's reliability. A traditional solution involves replicating the underlying resources and deploying a failover replica of the ML model. However, edge resources are typically limited, and increasing their size incurs significant computational and infrastructure cost overheads.This paper proposes a range of failover strategies for resource-constrained edge environments, leveraging the flexibility offered by ML models. We explore various approaches for graceful service degradation, such as degraded accuracy, latency, and sampling rate, and highlight their potential benefits and trade-offs. Furthermore, we discuss the challenges associated with these techniques and outline future directions.
AB - With recent innovations in machine learning (ML) technologies, especially deep learning, many IoT applications have increasingly relied on ML models for various tasks, such as classification, detection, and decision-making. Most of these tasks are latency-sensitive and depend on models deployed at the edge of the network. Network and edge devices are prone to various kinds of failures, such as transient, crash, or Byzantine failures. Such failures can impact the IoT device's ability to offload tasks, affecting the system's reliability. A traditional solution involves replicating the underlying resources and deploying a failover replica of the ML model. However, edge resources are typically limited, and increasing their size incurs significant computational and infrastructure cost overheads.This paper proposes a range of failover strategies for resource-constrained edge environments, leveraging the flexibility offered by ML models. We explore various approaches for graceful service degradation, such as degraded accuracy, latency, and sampling rate, and highlight their potential benefits and trade-offs. Furthermore, we discuss the challenges associated with these techniques and outline future directions.
KW - Edge computing
KW - Graceful degradation
KW - ML inference
KW - Replication
KW - Resilience
UR - http://www.scopus.com/inward/record.url?scp=85182388504&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182388504&partnerID=8YFLogxK
U2 - 10.1109/MILCOM58377.2023.10356302
DO - 10.1109/MILCOM58377.2023.10356302
M3 - Conference contribution
AN - SCOPUS:85182388504
T3 - MILCOM 2023 - 2023 IEEE Military Communications Conference: Communications Supporting Military Operations in a Contested Environment
SP - 144
EP - 149
BT - MILCOM 2023 - 2023 IEEE Military Communications Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Military Communications Conference, MILCOM 2023
Y2 - 30 October 2023 through 3 November 2023
ER -