Formal methods applied to high-performance computing software design: A case study of MPI one-sided communication-based locking

Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Rajeev Thakur, William Gropp

Research output: Contribution to journalArticlepeer-review

Abstract

There is a growing need to address the complexity of verifying the numerous concurrent protocols employed in the high-performance computing software. Today's approaches for verification consist of testing detailed implementations of these protocols. Unfortunately, this approach can seldom show the absence of bugs, and often results in serious bugs escaping into the deployed software. An approach called Model Checking has been demonstrated to be eminently helpful in debugging these protocols early in the software life cycle by offering the ability to represent and exhaustively analyze simplified formal protocol models. The effectiveness of model checking has yet to be adequately demonstrated in high-performance computing. This paper presents a case study of a concurrent protocol that was thought to be sufficiently well tested, but proved to contain two very non-obvious deadlocks in them. These bugs were automatically detected through model checking. The protocol models in which these bugs were detected were also easy to create. Recent work in our group demonstrates that even this tedium of model creation can be eliminated by employing dynamic source-code-level analysis methods. Our case study comes from the important domain of Message Passing Interface (MPI)-based programming, which is universally employed for simulating and predicting anything from the structural integrity of combustion chambers to the path of hurricanes. We argue that model checking must be taught as well as used widely within HPC, given this and similar success stories. Copyright.

Original languageEnglish (US)
Pages (from-to)23-43
Number of pages21
JournalSoftware - Practice and Experience
Volume40
Issue number1
DOIs
StatePublished - Jan 2010

Keywords

  • Concurrent programming
  • Dynamic analysis
  • Formal verification
  • High-performance computing (HPC)
  • Message Passing Interface (MPI)
  • Model checking
  • One-sided communication
  • Race condition
  • SPIN

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Formal methods applied to high-performance computing software design: A case study of MPI one-sided communication-based locking'. Together they form a unique fingerprint.

Cite this