Abstract
Modern datacenter infrastructures are increasingly architected as a cluster of loosely coupled services. The cluster states are typically maintained in a logically centralized, strongly consistent data store (e.g., ZooKeeper, Chubby and etcd), while the services learn about the evolving state by reading from the data store, or via a stream of notifications. However, it is challenging to ensure services are correct, even in the presence of failures, networking issues, and the inherent asynchrony of the distributed system. In this paper, we identify that partial histories can be used to effectively reason about correctness for individual services in such distributed infrastructure systems. That is, individual services make decisions based on observing only a subset of changes to the world around them. We show that partial histories, when applied to distributed infrastructures, have immense explanatory power and utility over the state of the art. We discuss the implications of partial histories and sketch tooling for reasoning about distributed infrastructure systems.
Original language | English (US) |
---|---|
Title of host publication | HotOS 2021 - Proceedings of the 2021 Workshop on Hot Topics in Operating Systems |
Publisher | Association for Computing Machinery |
Pages | 213-220 |
Number of pages | 8 |
ISBN (Electronic) | 9781450384384 |
DOIs | |
State | Published - Jun 1 2021 |
Event | 18th Workshop on Hot Topics in Operating Systems, HotOS 2021 - Virtual, Online, United States Duration: Jun 1 2021 → Jun 3 2021 |
Conference
Conference | 18th Workshop on Hot Topics in Operating Systems, HotOS 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 6/1/21 → 6/3/21 |
Keywords
- correctness
- datacenter infrastructure
- distributed systems
- partial history
- reliability
ASJC Scopus subject areas
- Information Systems
- Computer Networks and Communications
- Hardware and Architecture