Scalable validation of binary lifters

Sandeep Dasgupta, Sushant Dinesh, Deepan Venkatesh, Vikram S. Adve, Christopher W. Fletcher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Validating the correctness of binary lifters is pivotal to gain trust in binary analysis, especially when used in scenarios where correctness is important. Existing approaches focus on validating the correctness of lifting instructions or basic blocks in isolation and do not scale to full programs. In this work, we show that formal translation validation of single instructions for a complex ISA like x86-64 is not only practical, but can be used as a building block for scalable full-program validation. Our work is the first to do translation validation of single instructions on an architecture as extensive as x86-64, uses the most precise formal semantics available, and has the widest coverage in terms of the number of instructions tested for correctness. Next, we develop a novel technique that uses validated instructions to enable program-level validation, without resorting to performance-heavy semantic equivalence checking. Specifically, we compose the validated IR sequences using a tool we develop called Compositional Lifter to create a reference standard. The semantic equivalence check between the reference and the lifter output is then reduced to a graph-isomorphism check through the use of semantic preserving transformations. The translation validation of instructions in isolation revealed 29 new bugs in McSema - a mature open-source lifter from x86-64 to LLVM IR. Towards the validation of full programs, our approach was able to prove the translational correctness of 2254/2348 functions taken from LLVM's single-source benchmark test-suite.

Original languageEnglish (US)
Title of host publicationPLDI 2020 - Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
EditorsAlastair F. Donaldson, Emina Torlak
PublisherAssociation for Computing Machinery
Pages655-671
Number of pages17
ISBN (Electronic)9781450376136
DOIs
StatePublished - Jun 11 2020
Event41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020 - London, United Kingdom
Duration: Jun 15 2020Jun 20 2020

Publication series

NameProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Conference

Conference41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020
Country/TerritoryUnited Kingdom
CityLondon
Period6/15/206/20/20

Keywords

  • Compiler Optimizations
  • Formal Semantics
  • Graph Isomorphism
  • LLVM IR
  • Translation Validation
  • x86-64

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Scalable validation of binary lifters'. Together they form a unique fingerprint.

Cite this