Responding to Network Failures at Data-plane Speeds with Network Programmability

Jonatas A. Marques, Kirill Levchenko, Luciano Paschoal Gaspary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Measurement studies show that equipment failures happen quite frequently and pose a challenge to reliable network operation. Quickly recovering from failures is critical to meeting service guarantees. Traditional routing protocols, due to being executed in a distributed fashion and involving multiple devices in a network, require non-negligible time to recompute routes upon failures. SDN with OpenFlow simplifies route recomputation, but the time to compute and install alternative forwarding entries can still result in significant packet loss. Existing fast failover mechanisms cannot handle all types of failure and do not guarantee the use of the best paths. In this paper, we present FELIX, an approach for failure recovery that reroutes around failures at data plane timescales. Felix works by efficiently pre-computing tactics to handle failure scenarios that can be quickly activated in the data plane in response to failures. Our evaluation shows that our approach can recover from failures up to three orders of magnitude faster than existing SDN approaches.

Original languageEnglish (US)
Title of host publicationProceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023
EditorsKemal Akkaya, Olivier Festor, Carol Fung, Mohammad Ashiqur Rahman, Lisandro Zambenedetti Granville, Carlos Raniery Paula dos Santos
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665477161
DOIs
StatePublished - 2023
Event36th IEEE/IFIP Network Operations and Management Symposium, NOMS 2023 - Miami, United States
Duration: May 8 2023May 12 2023

Publication series

NameProceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023

Conference

Conference36th IEEE/IFIP Network Operations and Management Symposium, NOMS 2023
Country/TerritoryUnited States
CityMiami
Period5/8/235/12/23

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Responding to Network Failures at Data-plane Speeds with Network Programmability'. Together they form a unique fingerprint.

Cite this