Automated Program Repair in the Era of Large Pre-trained Language Models

Chunqiu Steven Xia, Yuxiang Wei, Lingming Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged LLMs for APR without relying on any bug-fixing datasets. Meanwhile, such existing work either failed to include state-of-the-art LLMs or was not evaluated on realistic datasets. Thus, the true power of modern LLMs on the important APR problem is yet to be revealed. In this work, we perform the first extensive study on directly applying LLMs for APR. We select 9 recent state-of-the-art LLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use LLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the LLMs under these repair settings on 5 datasets across 3 different languages and compare different LLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the LLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-of-the-art LLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied LLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the LLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that LLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE/ACM 45th International Conference on Software Engineering, ICSE 2023
PublisherIEEE Computer Society
Pages1482-1494
Number of pages13
ISBN (Electronic)9781665457019
DOIs
StatePublished - 2023
Event45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 - Melbourne, Australia
Duration: May 15 2023May 16 2023

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference45th IEEE/ACM International Conference on Software Engineering, ICSE 2023
Country/TerritoryAustralia
CityMelbourne
Period5/15/235/16/23

Keywords

  • Automated Program Repair
  • Machine Learning

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Automated Program Repair in the Era of Large Pre-trained Language Models'. Together they form a unique fingerprint.

Cite this