TY - GEN
T1 - Anvil
T2 - 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024
AU - Sun, Xudong
AU - Ma, Wenjie
AU - Gu, Jiawei Tyler
AU - Ma, Zicheng
AU - Chajed, Tej
AU - Howell, Jon
AU - Lattuada, Andrea
AU - Padon, Oded
AU - Suresh, Lalith
AU - Szekeres, Adriana
AU - Xu, Tianyin
N1 - We thank the anonymous reviewers for their insightful comments on the paper. We thank Jay Lorch, Owolabi Legunsen, Ramnatthan Alagappan, and Yongle Zhang for their valuable feedback and discussions that helped shape this work. This work was funded in part by NSF CNS-2145295 and a VMware Research Gift.
PY - 2024
Y1 - 2024
N2 - Modern clouds depend crucially on an extensible ecosystem of thousands of controllers, each managing critical systems (e.g., a ZooKeeper cluster). A controller continuously reconciles the current state of the system to a desired state according to a declarative description. However, controllers have bugs that make them never achieve the desired state, due to concurrency, asynchrony, and failures; there are cases where after an inopportune failure, a controller can make no further progress. Formal verification is promising for avoiding bugs in distributed systems, but most work so far focused on safety, whereas reconciliation is fundamentally not a safety property. This paper develops the first tool to apply formal verification to the problem of controller correctness, with a general specification we call eventually stable reconciliation, written as a concise temporal logic liveness property. We present Anvil, a framework for developing controller implementations in Rust and verifying that the controllers correctly implement eventually stable reconciliation. We use Anvil to verify three Kubernetes controllers for managing ZooKeeper, RabbitMQ, and FluentBit, which can readily be deployed in Kubernetes platforms and are comparable in terms of features and performance to widely used unverified controllers.
AB - Modern clouds depend crucially on an extensible ecosystem of thousands of controllers, each managing critical systems (e.g., a ZooKeeper cluster). A controller continuously reconciles the current state of the system to a desired state according to a declarative description. However, controllers have bugs that make them never achieve the desired state, due to concurrency, asynchrony, and failures; there are cases where after an inopportune failure, a controller can make no further progress. Formal verification is promising for avoiding bugs in distributed systems, but most work so far focused on safety, whereas reconciliation is fundamentally not a safety property. This paper develops the first tool to apply formal verification to the problem of controller correctness, with a general specification we call eventually stable reconciliation, written as a concise temporal logic liveness property. We present Anvil, a framework for developing controller implementations in Rust and verifying that the controllers correctly implement eventually stable reconciliation. We use Anvil to verify three Kubernetes controllers for managing ZooKeeper, RabbitMQ, and FluentBit, which can readily be deployed in Kubernetes platforms and are comparable in terms of features and performance to widely used unverified controllers.
UR - http://www.scopus.com/inward/record.url?scp=85201234086&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85201234086&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85201234086
T3 - Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024
SP - 649
EP - 666
BT - Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2024
PB - USENIX Association
Y2 - 10 July 2024 through 12 July 2024
ER -