Abstract
Various automated bug localization technologies have recently emerged that require adequate bug-fix records available to train a predictive model. However, many projects in practice might not provide these necessities, especially for new projects in the first release, due to the expensive human effort for constructing a large amount of bug-fix records. Aiming to capture the potential relevance distribution between the bug report and code file from a limited number of available bug-fix records, we present the first semi-supervised bug localization model named BL-GAN in this paper. For this purpose, the promising Generative Adversarial Network is introduced in BL-GAN, in which synthetic bug-fix records close to the real ones are constructed by searching the project directory tree to generate file paths instead of traversing the contents of all code files. For processing bug reports, the proposed BL-GAN adopts an attention-based Transformer architecture to capture semantic and sequence information. In order to capture the proprietary structural information in code files, BL-GAN incorporates a novel multilayer Graph Convolutional Network to process the source code in a graphical view. Extensive experiments on large-scale real-world datasets reveal that our model BL-GAN significantly outperforms the state-of-the-art on all evaluation measures.
Original language | English (US) |
---|---|
Pages (from-to) | 11112-11125 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 35 |
Issue number | 11 |
DOIs | |
State | Published - Nov 1 2023 |
Keywords
- Bug localization
- bug report
- generative adversarial network
- semi-supervised learning
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics