TY - GEN
T1 - SemRegex
T2 - 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
AU - Zhong, Zexuan
AU - Guo, Jiaqi
AU - Yang, Wei
AU - Peng, Jian
AU - Xie, Tao
AU - Lou, Jian Guang
AU - Liu, Ting
AU - Zhang, Dongmei
N1 - Funding Information:
The work from the authors at the University of Illinois at Urbana-Champaign was supported in part by National Science Foundation under grants no. CNS-1513939, CNS-1564274, and CCF-1816615. The work from the authors at Xi'an Jiaotong University was supported by National Natural Science Foundation of China (61632015, 61772408, 61721002).
Funding Information:
The work from the authors at the University of Illinois at Urbana-Champaign was supported in part by National Science Foundation under grants no. CNS-1513939, CNS-1564274, and CCF-1816615. The work from the authors at Xi’an Jiaotong University was supported by National Natural Science Foundation of China (61632015, 61772408, 61721002).
Publisher Copyright:
© 2018 Association for Computational Linguistics
PY - 2018
Y1 - 2018
N2 - Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e., semantically equivalent programs may have many syntactically different forms. To address this issue, in this paper, we propose a semantics-based approach named SemRegex. SemRegex provides solutions for a subtask of the program-synthesis problem: generating regular expressions from natural language. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.
AB - Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e., semantically equivalent programs may have many syntactically different forms. To address this issue, in this paper, we propose a semantics-based approach named SemRegex. SemRegex provides solutions for a subtask of the program-synthesis problem: generating regular expressions from natural language. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.
UR - http://www.scopus.com/inward/record.url?scp=85081728039&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081728039&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85081728039
T3 - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
SP - 1608
EP - 1618
BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
A2 - Riloff, Ellen
A2 - Chiang, David
A2 - Hockenmaier, Julia
A2 - Tsujii, Jun'ichi
PB - Association for Computational Linguistics
Y2 - 31 October 2018 through 4 November 2018
ER -