Reasoning about modern datacenter infrastructures using partial histories

Xudong Sun, Lalith Suresh, Aishwarya Ganesan, Ramnatthan Alagappan, Michael Gasch, Lilia Tang, Tianyin Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Modern datacenter infrastructures are increasingly architected as a cluster of loosely coupled services. The cluster states are typically maintained in a logically centralized, strongly consistent data store (e.g., ZooKeeper, Chubby and etcd), while the services learn about the evolving state by reading from the data store, or via a stream of notifications. However, it is challenging to ensure services are correct, even in the presence of failures, networking issues, and the inherent asynchrony of the distributed system. In this paper, we identify that partial histories can be used to effectively reason about correctness for individual services in such distributed infrastructure systems. That is, individual services make decisions based on observing only a subset of changes to the world around them. We show that partial histories, when applied to distributed infrastructures, have immense explanatory power and utility over the state of the art. We discuss the implications of partial histories and sketch tooling for reasoning about distributed infrastructure systems.

Original languageEnglish (US)
Title of host publicationHotOS 2021 - Proceedings of the 2021 Workshop on Hot Topics in Operating Systems
PublisherAssociation for Computing Machinery
Number of pages8
ISBN (Electronic)9781450384384
StatePublished - Jun 1 2021
Event18th Workshop on Hot Topics in Operating Systems, HotOS 2021 - Virtual, Online, United States
Duration: Jun 1 2021Jun 3 2021


Conference18th Workshop on Hot Topics in Operating Systems, HotOS 2021
Country/TerritoryUnited States
CityVirtual, Online


  • correctness
  • datacenter infrastructure
  • distributed systems
  • partial history
  • reliability

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Cite this