Light64: Lightweight hardware support for data race detection during systematic testing of parallel programs

Research output: Contribution to journalConference article

Abstract

Developing and testing parallel code is hard. Even for one given input, a parallel program can have many possible different thread interleavings, which are hard for the programmer to foresee and for a testing tool to cover using stress or random testing. For this reason, a recent trend is to use Systematic Testing, which methodically explores different thread interleavings, while checking for various bugs. Data races are common bugs but, unfortunately, checking for races is often skipped in systematic testers because it introduces substantial runtime overhead if done purely in software. Recently, several techniques for race detection in hardware have been proposed, but they still require significant hardware support. This paper presents Light64, a novel technique for data race detection during systematic testing that has both small runtime overhead and very lightweight hardware requirements. Light64 is based on the observation that two thread interleavings in which racing accesses are flipped will very likely exhibit some deviation in their program execution history. Light64 computes a 64-bit hash of the program execution history during systematic testing. If the hashes of two interleavings with the same happens-before graph differ, then a race has occurred. Light64 only needs a 64-bit register per core, a drastic improvement over previous hardware schemes. In addition, our experiments on SPLASH-2 applications show that Light64 has no false positives, detects 96% of races, and induces only a small slowdown for race-free executions - on average, 1% and 37% in two different modes.

Original languageEnglish (US)
Pages (from-to)541-552
Number of pages12
JournalProceedings of the Annual International Symposium on Microarchitecture, MICRO
DOIs
StatePublished - Dec 1 2009
Event42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42 - New York, NY, United States
Duration: Dec 12 2009Dec 16 2009

Fingerprint

Hardware
Testing
Codes (standards)
Experiments

Keywords

  • Data race
  • Execution history hash
  • Systematic testing

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

@article{11974caf4d6b4ea29ad8d2e58759eda7,
title = "Light64: Lightweight hardware support for data race detection during systematic testing of parallel programs",
abstract = "Developing and testing parallel code is hard. Even for one given input, a parallel program can have many possible different thread interleavings, which are hard for the programmer to foresee and for a testing tool to cover using stress or random testing. For this reason, a recent trend is to use Systematic Testing, which methodically explores different thread interleavings, while checking for various bugs. Data races are common bugs but, unfortunately, checking for races is often skipped in systematic testers because it introduces substantial runtime overhead if done purely in software. Recently, several techniques for race detection in hardware have been proposed, but they still require significant hardware support. This paper presents Light64, a novel technique for data race detection during systematic testing that has both small runtime overhead and very lightweight hardware requirements. Light64 is based on the observation that two thread interleavings in which racing accesses are flipped will very likely exhibit some deviation in their program execution history. Light64 computes a 64-bit hash of the program execution history during systematic testing. If the hashes of two interleavings with the same happens-before graph differ, then a race has occurred. Light64 only needs a 64-bit register per core, a drastic improvement over previous hardware schemes. In addition, our experiments on SPLASH-2 applications show that Light64 has no false positives, detects 96{\%} of races, and induces only a small slowdown for race-free executions - on average, 1{\%} and 37{\%} in two different modes.",
keywords = "Data race, Execution history hash, Systematic testing",
author = "Adrian Nistor and Darko Marinov and Josep Torrellas",
year = "2009",
month = "12",
day = "1",
doi = "10.1145/1669112.1669180",
language = "English (US)",
pages = "541--552",
journal = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
issn = "1072-4451",

}

TY - JOUR

T1 - Light64

T2 - Lightweight hardware support for data race detection during systematic testing of parallel programs

AU - Nistor, Adrian

AU - Marinov, Darko

AU - Torrellas, Josep

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Developing and testing parallel code is hard. Even for one given input, a parallel program can have many possible different thread interleavings, which are hard for the programmer to foresee and for a testing tool to cover using stress or random testing. For this reason, a recent trend is to use Systematic Testing, which methodically explores different thread interleavings, while checking for various bugs. Data races are common bugs but, unfortunately, checking for races is often skipped in systematic testers because it introduces substantial runtime overhead if done purely in software. Recently, several techniques for race detection in hardware have been proposed, but they still require significant hardware support. This paper presents Light64, a novel technique for data race detection during systematic testing that has both small runtime overhead and very lightweight hardware requirements. Light64 is based on the observation that two thread interleavings in which racing accesses are flipped will very likely exhibit some deviation in their program execution history. Light64 computes a 64-bit hash of the program execution history during systematic testing. If the hashes of two interleavings with the same happens-before graph differ, then a race has occurred. Light64 only needs a 64-bit register per core, a drastic improvement over previous hardware schemes. In addition, our experiments on SPLASH-2 applications show that Light64 has no false positives, detects 96% of races, and induces only a small slowdown for race-free executions - on average, 1% and 37% in two different modes.

AB - Developing and testing parallel code is hard. Even for one given input, a parallel program can have many possible different thread interleavings, which are hard for the programmer to foresee and for a testing tool to cover using stress or random testing. For this reason, a recent trend is to use Systematic Testing, which methodically explores different thread interleavings, while checking for various bugs. Data races are common bugs but, unfortunately, checking for races is often skipped in systematic testers because it introduces substantial runtime overhead if done purely in software. Recently, several techniques for race detection in hardware have been proposed, but they still require significant hardware support. This paper presents Light64, a novel technique for data race detection during systematic testing that has both small runtime overhead and very lightweight hardware requirements. Light64 is based on the observation that two thread interleavings in which racing accesses are flipped will very likely exhibit some deviation in their program execution history. Light64 computes a 64-bit hash of the program execution history during systematic testing. If the hashes of two interleavings with the same happens-before graph differ, then a race has occurred. Light64 only needs a 64-bit register per core, a drastic improvement over previous hardware schemes. In addition, our experiments on SPLASH-2 applications show that Light64 has no false positives, detects 96% of races, and induces only a small slowdown for race-free executions - on average, 1% and 37% in two different modes.

KW - Data race

KW - Execution history hash

KW - Systematic testing

UR - http://www.scopus.com/inward/record.url?scp=76749145125&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76749145125&partnerID=8YFLogxK

U2 - 10.1145/1669112.1669180

DO - 10.1145/1669112.1669180

M3 - Conference article

AN - SCOPUS:76749145125

SP - 541

EP - 552

JO - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

JF - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SN - 1072-4451

ER -