REINAM: Reinforcement learning for input-grammar inference

Zhengkai Wu, Evan Johnson, Wei Yang, Osbert Bastani, Dawn Song, Jian Peng, Tao Xie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Program input grammars (i.e., grammars encoding the language of valid program inputs) facilitate a wide range of applications in software engineering such as symbolic execution and delta debugging. Grammars synthesized by existing approaches can cover only a small part of the valid input space mainly due to unanalyzable code (e.g., native code) in programs and lacking high-quality and high-variety seed inputs. To address these challenges, we present REINAM, a reinforcement-learning approach for synthesizing probabilistic context-free program input grammars without any seed inputs. REINAM uses an industrial symbolic execution engine to generate an initial set of inputs for the given target program, and then uses an iterative process of grammar generalization to proactively generate additional inputs to infer grammars generalized from these initial seed inputs. To efficiently search for target generalizations in a huge search space of candidate generalization operators, REINAM includes a novel formulation of the search problem as a reinforcement learning problem. Our evaluation on eleven real-world benchmarks shows that REINAM outperforms an existing state-of-the-art approach on precision and recall of synthesized grammars, and fuzz testing based on REINAM substantially increases the coverage of the space of valid inputs. REINAM is able to synthesize a grammar covering the entire valid input space for some benchmarks without decreasing the accuracy of the grammar.

Original languageEnglish (US)
Title of host publicationESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsSven Apel, Marlon Dumas, Alessandra Russo, Dietmar Pfahl
PublisherAssociation for Computing Machinery
Pages488-498
Number of pages11
ISBN (Electronic)9781450355728
DOIs
StatePublished - Aug 12 2019
Event27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019 - Tallinn, Estonia
Duration: Aug 26 2019Aug 30 2019

Publication series

NameESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019
Country/TerritoryEstonia
CityTallinn
Period8/26/198/30/19

Keywords

  • Dynamic symbolic execution
  • Fuzzing
  • Grammar synthesis
  • Reinforcement learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'REINAM: Reinforcement learning for input-grammar inference'. Together they form a unique fingerprint.

Cite this