TY - GEN
T1 - Responding to Network Failures at Data-plane Speeds with Network Programmability
AU - Marques, Jonatas A.
AU - Levchenko, Kirill
AU - Gaspary, Luciano Paschoal
N1 - This work was supported in part by CAPES - Brazil (Finance Code 1), CNPq - Brazil, RNP - Brazil, FAPESP - Brazil (#2020/05183-0), and CYTED - Spain (#519RT0580).
PY - 2023
Y1 - 2023
N2 - Measurement studies show that equipment failures happen quite frequently and pose a challenge to reliable network operation. Quickly recovering from failures is critical to meeting service guarantees. Traditional routing protocols, due to being executed in a distributed fashion and involving multiple devices in a network, require non-negligible time to recompute routes upon failures. SDN with OpenFlow simplifies route recomputation, but the time to compute and install alternative forwarding entries can still result in significant packet loss. Existing fast failover mechanisms cannot handle all types of failure and do not guarantee the use of the best paths. In this paper, we present FELIX, an approach for failure recovery that reroutes around failures at data plane timescales. Felix works by efficiently pre-computing tactics to handle failure scenarios that can be quickly activated in the data plane in response to failures. Our evaluation shows that our approach can recover from failures up to three orders of magnitude faster than existing SDN approaches.
AB - Measurement studies show that equipment failures happen quite frequently and pose a challenge to reliable network operation. Quickly recovering from failures is critical to meeting service guarantees. Traditional routing protocols, due to being executed in a distributed fashion and involving multiple devices in a network, require non-negligible time to recompute routes upon failures. SDN with OpenFlow simplifies route recomputation, but the time to compute and install alternative forwarding entries can still result in significant packet loss. Existing fast failover mechanisms cannot handle all types of failure and do not guarantee the use of the best paths. In this paper, we present FELIX, an approach for failure recovery that reroutes around failures at data plane timescales. Felix works by efficiently pre-computing tactics to handle failure scenarios that can be quickly activated in the data plane in response to failures. Our evaluation shows that our approach can recover from failures up to three orders of magnitude faster than existing SDN approaches.
UR - http://www.scopus.com/inward/record.url?scp=85164735809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164735809&partnerID=8YFLogxK
U2 - 10.1109/NOMS56928.2023.10154329
DO - 10.1109/NOMS56928.2023.10154329
M3 - Conference contribution
AN - SCOPUS:85164735809
T3 - Proceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023
BT - Proceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023
A2 - Akkaya, Kemal
A2 - Festor, Olivier
A2 - Fung, Carol
A2 - Rahman, Mohammad Ashiqur
A2 - Granville, Lisandro Zambenedetti
A2 - dos Santos, Carlos Raniery Paula
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 36th IEEE/IFIP Network Operations and Management Symposium, NOMS 2023
Y2 - 8 May 2023 through 12 May 2023
ER -