TY - JOUR
T1 - BL-GAN
T2 - Semi-Supervised Bug Localization via Generative Adversarial Network
AU - Zhu, Ziye
AU - Tong, Hanghang
AU - Wang, Yu
AU - Li, Yun
N1 - This work was supported by the National Natural Science Foundation of China under Grant 61772284. The work of Hanghang Tong was supported by NSF under Grant 1947135.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Various automated bug localization technologies have recently emerged that require adequate bug-fix records available to train a predictive model. However, many projects in practice might not provide these necessities, especially for new projects in the first release, due to the expensive human effort for constructing a large amount of bug-fix records. Aiming to capture the potential relevance distribution between the bug report and code file from a limited number of available bug-fix records, we present the first semi-supervised bug localization model named BL-GAN in this paper. For this purpose, the promising Generative Adversarial Network is introduced in BL-GAN, in which synthetic bug-fix records close to the real ones are constructed by searching the project directory tree to generate file paths instead of traversing the contents of all code files. For processing bug reports, the proposed BL-GAN adopts an attention-based Transformer architecture to capture semantic and sequence information. In order to capture the proprietary structural information in code files, BL-GAN incorporates a novel multilayer Graph Convolutional Network to process the source code in a graphical view. Extensive experiments on large-scale real-world datasets reveal that our model BL-GAN significantly outperforms the state-of-the-art on all evaluation measures.
AB - Various automated bug localization technologies have recently emerged that require adequate bug-fix records available to train a predictive model. However, many projects in practice might not provide these necessities, especially for new projects in the first release, due to the expensive human effort for constructing a large amount of bug-fix records. Aiming to capture the potential relevance distribution between the bug report and code file from a limited number of available bug-fix records, we present the first semi-supervised bug localization model named BL-GAN in this paper. For this purpose, the promising Generative Adversarial Network is introduced in BL-GAN, in which synthetic bug-fix records close to the real ones are constructed by searching the project directory tree to generate file paths instead of traversing the contents of all code files. For processing bug reports, the proposed BL-GAN adopts an attention-based Transformer architecture to capture semantic and sequence information. In order to capture the proprietary structural information in code files, BL-GAN incorporates a novel multilayer Graph Convolutional Network to process the source code in a graphical view. Extensive experiments on large-scale real-world datasets reveal that our model BL-GAN significantly outperforms the state-of-the-art on all evaluation measures.
KW - Bug localization
KW - bug report
KW - generative adversarial network
KW - semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85144084847&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144084847&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2022.3225329
DO - 10.1109/TKDE.2022.3225329
M3 - Article
AN - SCOPUS:85144084847
SN - 1041-4347
VL - 35
SP - 11112
EP - 11125
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 11
ER -