Minotaur: Adapting Software Testing Techniques for Hardware Errors

Abdulrahman Mahmoud, Radha Venkatagiri, Khalique Ahmed, Sasa Misailovic, Darko Marinov, Christopher Wardlaw Fletcher, Sarita V Adve

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the end of conventional CMOS scaling, efficient resiliency solutions are needed to address the increased likelihood of hardware errors. Silent data corruptions (SDCs) are especially harmful because they can create unacceptable output without the user's knowledge. Several resiliency analysis techniques have been proposed to identify SDC-causing instructions, but they remain too slow for practical use and/or sacrifice accuracy to improve analysis speed. We develop Minotaur, a novel toolkit to improve the speed and accuracy of resiliency analysis. The key insight behind Minotaur is that modern resiliency analysis has many conceptual similarities to software testing; therefore, adapting techniques from the rich software testing literature can lead to principled and significant improvements in resiliency analysis. Minotaur identifies and adapts four concepts from software testing: 1) it introduces the concept of input quality criteria for resiliency analysis and identifies PC coverage as a simple but effective criterion; 2) it creates (fast) minimized inputs from (slow) standard benchmark inputs, using the input quality criteria to assess the goodness of the created input; 3) it adapts the concept of test case prioritization to prioritize error injections and invoke early termination for a given instruction to speed up error-injection campaigns; and 4) it further adapts test case or input prioritization to accelerate SDC discovery across multiple inputs. We evaluate Minotaur by applying it to Approxilyzer, a state-of-the-art resiliency analysis tool. Minotaur's first three techniques speed up Approxilyzer's resiliency analysis by 10.3X (on average) for the workloads studied. Moreover, they identify 96% (on average) of all SDC-causing instructions explored, compared to 64% identified by Approxilyzer alone. Minotaur's fourth technique (input prioritization) enables identifying all SDC-causing instructions explored across multiple inputs at a speed 2.3X faster (on average) than analyzing each input independently for our workloads.

Original languageEnglish (US)
Title of host publicationASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherAssociation for Computing Machinery
Pages1087-1103
Number of pages17
ISBN (Electronic)9781450362405
DOIs
StatePublished - Apr 4 2019
Event24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019 - Providence, United States
Duration: Apr 13 2019Apr 17 2019

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019
CountryUnited States
CityProvidence
Period4/13/194/17/19

Fingerprint

Software testing
Computer hardware
Hardware

Keywords

  • Coverage metrics
  • Fault tolerance
  • Hardware reliability
  • Input minimization and prioritization
  • Resiliency analysis
  • Silent data corruption (SDC)
  • Software testing

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

Mahmoud, A., Venkatagiri, R., Ahmed, K., Misailovic, S., Marinov, D., Fletcher, C. W., & Adve, S. V. (2019). Minotaur: Adapting Software Testing Techniques for Hardware Errors. In ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 1087-1103). (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). Association for Computing Machinery. https://doi.org/10.1145/3297858.3304050

Minotaur : Adapting Software Testing Techniques for Hardware Errors. / Mahmoud, Abdulrahman; Venkatagiri, Radha; Ahmed, Khalique; Misailovic, Sasa; Marinov, Darko; Fletcher, Christopher Wardlaw; Adve, Sarita V.

ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 2019. p. 1087-1103 (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mahmoud, A, Venkatagiri, R, Ahmed, K, Misailovic, S, Marinov, D, Fletcher, CW & Adve, SV 2019, Minotaur: Adapting Software Testing Techniques for Hardware Errors. in ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, Association for Computing Machinery, pp. 1087-1103, 24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, United States, 4/13/19. https://doi.org/10.1145/3297858.3304050
Mahmoud A, Venkatagiri R, Ahmed K, Misailovic S, Marinov D, Fletcher CW et al. Minotaur: Adapting Software Testing Techniques for Hardware Errors. In ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery. 2019. p. 1087-1103. (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). https://doi.org/10.1145/3297858.3304050
Mahmoud, Abdulrahman ; Venkatagiri, Radha ; Ahmed, Khalique ; Misailovic, Sasa ; Marinov, Darko ; Fletcher, Christopher Wardlaw ; Adve, Sarita V. / Minotaur : Adapting Software Testing Techniques for Hardware Errors. ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 2019. pp. 1087-1103 (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).
@inproceedings{f57435cd7b784ba5b6d72dd836643925,
title = "Minotaur: Adapting Software Testing Techniques for Hardware Errors",
abstract = "With the end of conventional CMOS scaling, efficient resiliency solutions are needed to address the increased likelihood of hardware errors. Silent data corruptions (SDCs) are especially harmful because they can create unacceptable output without the user's knowledge. Several resiliency analysis techniques have been proposed to identify SDC-causing instructions, but they remain too slow for practical use and/or sacrifice accuracy to improve analysis speed. We develop Minotaur, a novel toolkit to improve the speed and accuracy of resiliency analysis. The key insight behind Minotaur is that modern resiliency analysis has many conceptual similarities to software testing; therefore, adapting techniques from the rich software testing literature can lead to principled and significant improvements in resiliency analysis. Minotaur identifies and adapts four concepts from software testing: 1) it introduces the concept of input quality criteria for resiliency analysis and identifies PC coverage as a simple but effective criterion; 2) it creates (fast) minimized inputs from (slow) standard benchmark inputs, using the input quality criteria to assess the goodness of the created input; 3) it adapts the concept of test case prioritization to prioritize error injections and invoke early termination for a given instruction to speed up error-injection campaigns; and 4) it further adapts test case or input prioritization to accelerate SDC discovery across multiple inputs. We evaluate Minotaur by applying it to Approxilyzer, a state-of-the-art resiliency analysis tool. Minotaur's first three techniques speed up Approxilyzer's resiliency analysis by 10.3X (on average) for the workloads studied. Moreover, they identify 96{\%} (on average) of all SDC-causing instructions explored, compared to 64{\%} identified by Approxilyzer alone. Minotaur's fourth technique (input prioritization) enables identifying all SDC-causing instructions explored across multiple inputs at a speed 2.3X faster (on average) than analyzing each input independently for our workloads.",
keywords = "Coverage metrics, Fault tolerance, Hardware reliability, Input minimization and prioritization, Resiliency analysis, Silent data corruption (SDC), Software testing",
author = "Abdulrahman Mahmoud and Radha Venkatagiri and Khalique Ahmed and Sasa Misailovic and Darko Marinov and Fletcher, {Christopher Wardlaw} and Adve, {Sarita V}",
year = "2019",
month = "4",
day = "4",
doi = "10.1145/3297858.3304050",
language = "English (US)",
series = "International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS",
publisher = "Association for Computing Machinery",
pages = "1087--1103",
booktitle = "ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems",

}

TY - GEN

T1 - Minotaur

T2 - Adapting Software Testing Techniques for Hardware Errors

AU - Mahmoud, Abdulrahman

AU - Venkatagiri, Radha

AU - Ahmed, Khalique

AU - Misailovic, Sasa

AU - Marinov, Darko

AU - Fletcher, Christopher Wardlaw

AU - Adve, Sarita V

PY - 2019/4/4

Y1 - 2019/4/4

N2 - With the end of conventional CMOS scaling, efficient resiliency solutions are needed to address the increased likelihood of hardware errors. Silent data corruptions (SDCs) are especially harmful because they can create unacceptable output without the user's knowledge. Several resiliency analysis techniques have been proposed to identify SDC-causing instructions, but they remain too slow for practical use and/or sacrifice accuracy to improve analysis speed. We develop Minotaur, a novel toolkit to improve the speed and accuracy of resiliency analysis. The key insight behind Minotaur is that modern resiliency analysis has many conceptual similarities to software testing; therefore, adapting techniques from the rich software testing literature can lead to principled and significant improvements in resiliency analysis. Minotaur identifies and adapts four concepts from software testing: 1) it introduces the concept of input quality criteria for resiliency analysis and identifies PC coverage as a simple but effective criterion; 2) it creates (fast) minimized inputs from (slow) standard benchmark inputs, using the input quality criteria to assess the goodness of the created input; 3) it adapts the concept of test case prioritization to prioritize error injections and invoke early termination for a given instruction to speed up error-injection campaigns; and 4) it further adapts test case or input prioritization to accelerate SDC discovery across multiple inputs. We evaluate Minotaur by applying it to Approxilyzer, a state-of-the-art resiliency analysis tool. Minotaur's first three techniques speed up Approxilyzer's resiliency analysis by 10.3X (on average) for the workloads studied. Moreover, they identify 96% (on average) of all SDC-causing instructions explored, compared to 64% identified by Approxilyzer alone. Minotaur's fourth technique (input prioritization) enables identifying all SDC-causing instructions explored across multiple inputs at a speed 2.3X faster (on average) than analyzing each input independently for our workloads.

AB - With the end of conventional CMOS scaling, efficient resiliency solutions are needed to address the increased likelihood of hardware errors. Silent data corruptions (SDCs) are especially harmful because they can create unacceptable output without the user's knowledge. Several resiliency analysis techniques have been proposed to identify SDC-causing instructions, but they remain too slow for practical use and/or sacrifice accuracy to improve analysis speed. We develop Minotaur, a novel toolkit to improve the speed and accuracy of resiliency analysis. The key insight behind Minotaur is that modern resiliency analysis has many conceptual similarities to software testing; therefore, adapting techniques from the rich software testing literature can lead to principled and significant improvements in resiliency analysis. Minotaur identifies and adapts four concepts from software testing: 1) it introduces the concept of input quality criteria for resiliency analysis and identifies PC coverage as a simple but effective criterion; 2) it creates (fast) minimized inputs from (slow) standard benchmark inputs, using the input quality criteria to assess the goodness of the created input; 3) it adapts the concept of test case prioritization to prioritize error injections and invoke early termination for a given instruction to speed up error-injection campaigns; and 4) it further adapts test case or input prioritization to accelerate SDC discovery across multiple inputs. We evaluate Minotaur by applying it to Approxilyzer, a state-of-the-art resiliency analysis tool. Minotaur's first three techniques speed up Approxilyzer's resiliency analysis by 10.3X (on average) for the workloads studied. Moreover, they identify 96% (on average) of all SDC-causing instructions explored, compared to 64% identified by Approxilyzer alone. Minotaur's fourth technique (input prioritization) enables identifying all SDC-causing instructions explored across multiple inputs at a speed 2.3X faster (on average) than analyzing each input independently for our workloads.

KW - Coverage metrics

KW - Fault tolerance

KW - Hardware reliability

KW - Input minimization and prioritization

KW - Resiliency analysis

KW - Silent data corruption (SDC)

KW - Software testing

UR - http://www.scopus.com/inward/record.url?scp=85064690859&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064690859&partnerID=8YFLogxK

U2 - 10.1145/3297858.3304050

DO - 10.1145/3297858.3304050

M3 - Conference contribution

AN - SCOPUS:85064690859

T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

SP - 1087

EP - 1103

BT - ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems

PB - Association for Computing Machinery

ER -