Minotaur: Adapting Software Testing Techniques for Hardware Errors

Abdulrahman Mahmoud, Radha Venkatagiri, Khalique Ahmed, Sasa Misailovic, Darko Marinov, Christopher W. Fletcher, Sarita V. Adve

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the end of conventional CMOS scaling, efficient resiliency solutions are needed to address the increased likelihood of hardware errors. Silent data corruptions (SDCs) are especially harmful because they can create unacceptable output without the user's knowledge. Several resiliency analysis techniques have been proposed to identify SDC-causing instructions, but they remain too slow for practical use and/or sacrifice accuracy to improve analysis speed. We develop Minotaur, a novel toolkit to improve the speed and accuracy of resiliency analysis. The key insight behind Minotaur is that modern resiliency analysis has many conceptual similarities to software testing; therefore, adapting techniques from the rich software testing literature can lead to principled and significant improvements in resiliency analysis. Minotaur identifies and adapts four concepts from software testing: 1) it introduces the concept of input quality criteria for resiliency analysis and identifies PC coverage as a simple but effective criterion; 2) it creates (fast) minimized inputs from (slow) standard benchmark inputs, using the input quality criteria to assess the goodness of the created input; 3) it adapts the concept of test case prioritization to prioritize error injections and invoke early termination for a given instruction to speed up error-injection campaigns; and 4) it further adapts test case or input prioritization to accelerate SDC discovery across multiple inputs. We evaluate Minotaur by applying it to Approxilyzer, a state-of-the-art resiliency analysis tool. Minotaur's first three techniques speed up Approxilyzer's resiliency analysis by 10.3X (on average) for the workloads studied. Moreover, they identify 96% (on average) of all SDC-causing instructions explored, compared to 64% identified by Approxilyzer alone. Minotaur's fourth technique (input prioritization) enables identifying all SDC-causing instructions explored across multiple inputs at a speed 2.3X faster (on average) than analyzing each input independently for our workloads.

Original languageEnglish (US)
Title of host publicationASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherAssociation for Computing Machinery
Pages1087-1103
Number of pages17
ISBN (Electronic)9781450362405
DOIs
StatePublished - Apr 4 2019
Event24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019 - Providence, United States
Duration: Apr 13 2019Apr 17 2019

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019
CountryUnited States
CityProvidence
Period4/13/194/17/19

    Fingerprint

Keywords

  • Coverage metrics
  • Fault tolerance
  • Hardware reliability
  • Input minimization and prioritization
  • Resiliency analysis
  • Silent data corruption (SDC)
  • Software testing

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

Mahmoud, A., Venkatagiri, R., Ahmed, K., Misailovic, S., Marinov, D., Fletcher, C. W., & Adve, S. V. (2019). Minotaur: Adapting Software Testing Techniques for Hardware Errors. In ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 1087-1103). (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). Association for Computing Machinery. https://doi.org/10.1145/3297858.3304050