skip to main content
10.1145/3643661.3643951acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

An Empirical Comparison of Code Generation Approaches for Ansible

Published: 07 August 2024 Publication History

Abstract

The rapid proliferation of LLM-based programming assistants has enabled fast and accurate automatic code generation for general purpose programming languages. Domain-specific languages like Ansible, a DSL for IT Automation, have seen a lack of support despite being critical to many fields, due to limited public-domain code for training models and a lack of interest from tool developers. To address this issue, we collect a novel dataset of permissively licensed Ansible code, and use it to create Warp, an LLM for code fine-tuned to produce Ansible tasks from a natural language prompt. We evaluate state-of-the-art tools for LLM-based code generation models, comparing multiple common strategies, including fine-tuning base models on Ansible code and retrieval-augmented-generation using documentation, in order to understand challenges with existing methodology and identify future research directions to enable better code generation for DSLs.

References

[1]
Ansible. 2023. Ansible Github Project. https://github.com/ansible/ansible. Retrieved December 15, 2023.
[2]
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, et al. 2023. Multi-lingual Evaluation of Code Generation Models. arXiv:2210.14868 [cs.LG]
[3]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]
[4]
Rastislav Bodik and Barbara Jobstmann. 2013. Algorithmic Program Synthesis: Introduction. Int. J. Softw. Tools Technol. Transf. 15, 5--6 (oct 2013), 397--411.
[5]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
[6]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
[7]
Matteo Ciniselli, Nathan Cooper, Luca Pascarella, Denys Poshyvanyk, Massimiliano Di Penta, and Gabriele Bavota. 2021. An Empirical Study on the Usage of BERT Models for Code Completion. arXiv:2103.07115 [cs.SE]
[8]
Kastan Day. 2023. "Better than Google" Multi-Media QA & Search for Electrical Engineering at UIUC. https://github.com/UIUC-Chatbot/ai-teaching-assistant-uiuc. Retrieved December 15, 2023.
[9]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155 [cs.CL]
[10]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685 [cs.CL]
[11]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS'20). Curran Associates Inc., Red Hook, NY, USA, Article 793, 16 pages.
[12]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. StarCoder: may the source be with you! arXiv:2305.06161 [cs.CL]
[13]
Zifan Nan, Hui Guan, and Xipeng Shen. 2020. HISyn: Human Learning-Inspired Natural Language Programming. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 75--86.
[14]
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Haiquan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In International Conference on Learning Representations. arXiv:2203.13474 https://api.semanticscholar.org/CorpusID:252668917
[15]
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
[16]
Shuyin Ouyang, Jie M. Zhang, Mark Harman, and Meng Wang. 2023. LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation. arXiv:2308.02828 [cs.SE]
[17]
Saurabh Pujar, Luca Buratti, Xiaojie Guo, Nicolas Dupuis, Burn Lewis, Sahil Suneja, Atin Sood, Ganesh Nalawade, Matthew Jones, Alessandro Morari, et al. 2023. Automated Code generation for Information Technology Tasks in YAML through Large Language Models. arXiv:2305.02783 [cs.SE]
[18]
Red Hat, Inc. 2023. Ansible Documentation. https://docs.ansible.com/. Retrieved December 15, 2023.
[19]
Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. ViperGPT: Visual Inference via Python Execution for Reasoning. arXiv:2303.08128 [cs.CV]
[20]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
[21]
Frank F. Xu, Uri Alon, Graham Neubig, and Vincent J. Hellendoorn. 2022. A Systematic Evaluation of Large Language Models of Code. arXiv:2202.13169 [cs.PL]
[22]
Shuyan Zhou, Uri Alon, Sumit Agarwal, and Graham Neubig. 2023. Code-BERTScore: Evaluating Code Generation with Pretrained Models of Code. arXiv:2302.05527 [cs.SE]
[23]
Shuyan Zhou, Uri Alon, Frank F. Xu, Zhiruo Wang, Zhengbao Jiang, and Graham Neubig. 2023. DocPrompting: Generating Code by Retrieving the Docs. arXiv:2207.05987 [cs.CL]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
InteNSE '24: Proceedings of the ACM/IEEE 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering
April 2024
24 pages
ISBN:9798400705649
DOI:10.1145/3643661
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

In-Cooperation

  • Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2024

Check for updates

Author Tags

  1. large language models
  2. code generation
  3. domain specific languages
  4. ansible

Qualifiers

  • Research-article

Conference

InteNSE '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 66
    Total Downloads
  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)9
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media