TY - GEN
T1 - QEDCartographer
T2 - 47th IEEE/ACM International Conference on Software Engineering, ICSE 2025
AU - Sanchez-Stern, Alex
AU - Varghese, Abhishek
AU - Kaufman, Zhanna
AU - Zhang, Dylan
AU - Ringer, Talia
AU - Brun, Yuriy
N1 - We thank Bruno Castro da Silva for help with reinforcement learning and Claire Le Goues for feedback on this paper. This work is supported by the National Science Foundation under grant no. CCF-2210243 and by the Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agencies (DARPA) under Contract No. FA8750-24-C-B044.
PY - 2025
Y1 - 2025
N2 - Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proofsynthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling rewardfree search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5 K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 26% shorter proofs 27% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 31.8% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
AB - Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proofsynthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling rewardfree search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5 K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 26% shorter proofs 27% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 31.8% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
KW - formal verification
KW - proof assistants
KW - proof synthesis
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105010321303
UR - https://www.scopus.com/inward/citedby.url?scp=105010321303&partnerID=8YFLogxK
U2 - 10.1109/ICSE55347.2025.00033
DO - 10.1109/ICSE55347.2025.00033
M3 - Conference contribution
AN - SCOPUS:105010321303
T3 - Proceedings - International Conference on Software Engineering
SP - 307
EP - 320
BT - Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025
PB - IEEE Computer Society
Y2 - 27 April 2025 through 3 May 2025
ER -