Multi-Grained Specifications for Distributed System Model Checking and Verification

Lingzhi Ouyang, Xudong Sun, Ruize Tang, Yu Huang, Madhav Jivrajani, Xiaoxing Ma, Tianyin Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents our experience specifying and verifying the correctness of ZooKeeper, a complex and evolving distributed coordination system. We use TLA+ to model fine-grained behaviors of ZooKeeper and use the TLC model checker to verify its correctness properties; we also check conformance between the model and code. The fundamental challenge is to balance the granularity of specifications and the scalability of model checking-fine-grained specifications lead to state-space explosion, while coarse-grained specifications introduce model-code gaps. To address this challenge, we write specifications with different granularities for composable modules, and compose them into mixed-grained specifications based on specific scenarios. For example, to verify code changes, we compose fine-grained specifications of changed modules and coarse-grained specifications that abstract away details of unchanged code with preserved interactions. We show that writing multi-grained specifications is a viable practice and can cope with model-code gaps without untenable state space, especially for evolving software where changes are typically local and incremental. We detected six severe bugs that violate five types of invariants and verified their code fixes; the fixes have been merged to ZooKeeper. We also improve the protocol design to make it easy to implement correctly.

Original languageEnglish (US)
Title of host publicationEuroSys 2025 - Proceedings of the 2025 20th European Conference on Computer Systems
PublisherAssociation for Computing Machinery
Pages379-395
Number of pages17
ISBN (Electronic)9798400711961
DOIs
StatePublished - Mar 30 2025
Event20th European Conference on Computer Systems, EuroSys 2025, co-located 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025 - Rotterdam, Netherlands
Duration: Mar 30 2025Apr 3 2025

Publication series

NameEuroSys 2025 - Proceedings of the 2025 20th European Conference on Computer Systems

Conference

Conference20th European Conference on Computer Systems, EuroSys 2025, co-located 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025
Country/TerritoryNetherlands
CityRotterdam
Period3/30/254/3/25

Keywords

  • Distributed systems
  • model checking
  • reliability

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Multi-Grained Specifications for Distributed System Model Checking and Verification'. Together they form a unique fingerprint.

Cite this