TY - GEN
T1 - Scalable validation of binary lifters
AU - Dasgupta, Sandeep
AU - Dinesh, Sushant
AU - Venkatesh, Deepan
AU - Adve, Vikram S.
AU - Fletcher, Christopher W.
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/6/11
Y1 - 2020/6/11
N2 - Validating the correctness of binary lifters is pivotal to gain trust in binary analysis, especially when used in scenarios where correctness is important. Existing approaches focus on validating the correctness of lifting instructions or basic blocks in isolation and do not scale to full programs. In this work, we show that formal translation validation of single instructions for a complex ISA like x86-64 is not only practical, but can be used as a building block for scalable full-program validation. Our work is the first to do translation validation of single instructions on an architecture as extensive as x86-64, uses the most precise formal semantics available, and has the widest coverage in terms of the number of instructions tested for correctness. Next, we develop a novel technique that uses validated instructions to enable program-level validation, without resorting to performance-heavy semantic equivalence checking. Specifically, we compose the validated IR sequences using a tool we develop called Compositional Lifter to create a reference standard. The semantic equivalence check between the reference and the lifter output is then reduced to a graph-isomorphism check through the use of semantic preserving transformations. The translation validation of instructions in isolation revealed 29 new bugs in McSema - a mature open-source lifter from x86-64 to LLVM IR. Towards the validation of full programs, our approach was able to prove the translational correctness of 2254/2348 functions taken from LLVM's single-source benchmark test-suite.
AB - Validating the correctness of binary lifters is pivotal to gain trust in binary analysis, especially when used in scenarios where correctness is important. Existing approaches focus on validating the correctness of lifting instructions or basic blocks in isolation and do not scale to full programs. In this work, we show that formal translation validation of single instructions for a complex ISA like x86-64 is not only practical, but can be used as a building block for scalable full-program validation. Our work is the first to do translation validation of single instructions on an architecture as extensive as x86-64, uses the most precise formal semantics available, and has the widest coverage in terms of the number of instructions tested for correctness. Next, we develop a novel technique that uses validated instructions to enable program-level validation, without resorting to performance-heavy semantic equivalence checking. Specifically, we compose the validated IR sequences using a tool we develop called Compositional Lifter to create a reference standard. The semantic equivalence check between the reference and the lifter output is then reduced to a graph-isomorphism check through the use of semantic preserving transformations. The translation validation of instructions in isolation revealed 29 new bugs in McSema - a mature open-source lifter from x86-64 to LLVM IR. Towards the validation of full programs, our approach was able to prove the translational correctness of 2254/2348 functions taken from LLVM's single-source benchmark test-suite.
KW - Compiler Optimizations
KW - Formal Semantics
KW - Graph Isomorphism
KW - LLVM IR
KW - Translation Validation
KW - x86-64
UR - http://www.scopus.com/inward/record.url?scp=85086820298&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086820298&partnerID=8YFLogxK
U2 - 10.1145/3385412.3385964
DO - 10.1145/3385412.3385964
M3 - Conference contribution
AN - SCOPUS:85086820298
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 655
EP - 671
BT - PLDI 2020 - Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
A2 - Donaldson, Alastair F.
A2 - Torlak, Emina
PB - Association for Computing Machinery
T2 - 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020
Y2 - 15 June 2020 through 20 June 2020
ER -