TY - GEN
T1 - An Empirical Comparison of Code Generation Approaches for Ansible
AU - Darnell, Benjamin
AU - Chopra, Hetarth
AU - Councilman, Aaron
AU - Grove, David
AU - Wang, Yu Xiong
AU - Adve, Vikram
N1 - This research was funded by the IBM Illinois Discovery Accelerator Institute under Grant IBM IIDAI W2177533 103509, and is part of the Delta research computing project, which is supported by the National Science Foundation (award OCI 2005572), and the State of Illinois. The authors would also like to thank Kastan Day and NCSA for support with the UIUC.chat tool used in the evaluation.
PY - 2024/4/15
Y1 - 2024/4/15
N2 - The rapid proliferation of LLM-based programming assistants has enabled fast and accurate automatic code generation for general purpose programming languages. Domain-specific languages like Ansible, a DSL for IT Automation, have seen a lack of support despite being critical to many fields, due to limited public-domain code for training models and a lack of interest from tool developers. To address this issue, we collect a novel dataset of permissively licensed Ansible code, and use it to create Warp, an LLM for code fine-tuned to produce Ansible tasks from a natural language prompt. We evaluate state-of-the-art tools for LLM-based code generation models, comparing multiple common strategies, including fine-tuning base models on Ansible code and retrieval-augmented-generation using documentation, in order to understand challenges with existing methodology and identify future research directions to enable better code generation for DSLs.
AB - The rapid proliferation of LLM-based programming assistants has enabled fast and accurate automatic code generation for general purpose programming languages. Domain-specific languages like Ansible, a DSL for IT Automation, have seen a lack of support despite being critical to many fields, due to limited public-domain code for training models and a lack of interest from tool developers. To address this issue, we collect a novel dataset of permissively licensed Ansible code, and use it to create Warp, an LLM for code fine-tuned to produce Ansible tasks from a natural language prompt. We evaluate state-of-the-art tools for LLM-based code generation models, comparing multiple common strategies, including fine-tuning base models on Ansible code and retrieval-augmented-generation using documentation, in order to understand challenges with existing methodology and identify future research directions to enable better code generation for DSLs.
KW - ansible
KW - code generation
KW - domain specific languages
KW - large language models
UR - http://www.scopus.com/inward/record.url?scp=85201660730&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85201660730&partnerID=8YFLogxK
U2 - 10.1145/3643661.3643951
DO - 10.1145/3643661.3643951
M3 - Conference contribution
AN - SCOPUS:85201660730
T3 - Proceedings - 2024 IEEE/ACM 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, InteNSE 2024
SP - 1
EP - 6
BT - Proceedings - 2024 IEEE/ACM 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, InteNSE 2024
PB - Association for Computing Machinery
T2 - 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, InteNSE 2024, co-located with ICSE 2024
Y2 - 15 April 2024
ER -