Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT

Chunqiu Steven Xia, Lingming Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily on handcrafted or mined bug fixing patterns and cannot easily generalize to other bug/fix types. To address this limitation, recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. Such LLM-based APR tools work by first constructing an input prompt built using the original buggy code and then querying the LLM to either fill-in (cloze-style APR) the correct code at the bug location or to produce a completely new code snippet as the patch. While the LLM-based APR tools are able to achieve state-of-the-art results, they still follow the classic Generate and Validate (GV) repair paradigm of first generating lots of patches by sampling from the same initial prompt and then validating each one afterwards. This not only leads to many repeated patches that are incorrect, but also misses the crucial and yet previously ignored information in test failures as well as in plausible patches. To address these aforementioned limitations, we propose ChatRepair, the first fully automated conversation-driven APR approach that interleaves patch generation with instant feedback to perform APR in a conversational style. ChatRepair first feeds the LLM with relevant test failure information to start with, and then learns from both failures and successes of earlier patching attempts of the same bug for more powerful APR. For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch. In this way, we can avoid making the same mistakes. For earlier patches that passed all the tests (i.e., plausible patches), we further ask the LLM to generate alternative variations of the original plausible patches. In this way, we can further build on and learn from earlier successes to generate more plausible patches to increase the chance of having correct patches. While our approach is general, we implement ChatRepair using state-of-the-art dialogue-based LLM - ChatGPT. Our evaluation on the widely studied Defects4j dataset shows that ChatRepair is able to achieve the new state-of-the-art in repair performance, achieving 114 and 48 correct fixes on Defects4j 1.2 and 2.0 respectively. By calculating the cost of accessing ChatGPT, we can fix 162 out of 337 bugs for $0.42

Original languageEnglish (US)
Title of host publicationISSTA 2024 - Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
EditorsMaria Christakis, Michael Pradel
PublisherAssociation for Computing Machinery
Pages819-831
Number of pages13
ISBN (Electronic)9798400706127
DOIs
StatePublished - Sep 11 2024
Event33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024 - Vienna, Austria
Duration: Sep 16 2024Sep 20 2024

Publication series

NameISSTA 2024 - Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Conference

Conference33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024
Country/TerritoryAustria
CityVienna
Period9/16/249/20/24

Keywords

  • Automated Program Repair
  • Large Language Model

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT'. Together they form a unique fingerprint.

Cite this