Robust non-intrusive record-replay with processor extraction

Filippo Gioachin, Gengbin Zheng, Laxmikant V. Kalé

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the advent of increasingly larger parallel machines, debugging is becoming more and more challenging. In particular, applications at this scale tend to behave non-deterministically, leading to race condition bugs. Furthermore, gaining access to these large machines for long debugging sessions is generally infeasible. In this paper, we present a 3-step algorithm to perform what we call "processor extraction": a procedure to record the execution of a set of processors from a parallel application, and replay any of them in a controlled environment. Our technique generates very low interference in the recorded program thanks to the separation between non-determinism elimination, and detailed processor recording. In order to improve robustness and accuracy, we further augmented our algorithm with a self-correction mechanism.

Original languageEnglish (US)
Title of host publicationPADTAD 2010 - International Workshop on Parallel and Distributed Systems
Subtitle of host publicationTesting, Analysis, and Debugging
Pages9-19
Number of pages11
DOIs
StatePublished - Dec 17 2010
Event8th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, PADTAD'10 - Trento, Italy
Duration: Jul 13 2010Jul 13 2010

Publication series

NamePADTAD 2010 - International Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging

Other

Other8th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, PADTAD'10
CountryItaly
CityTrento
Period7/13/107/13/10

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Robust non-intrusive record-replay with processor extraction'. Together they form a unique fingerprint.

Cite this