TY - GEN
T1 - Boosting coverage-based fault localization via graph-based representation learning
AU - Lou, Yiling
AU - Zhu, Qihao
AU - Dong, Jinhao
AU - Li, Xia
AU - Sun, Zeyu
AU - Hao, Dan
AU - Zhang, Lu
AU - Zhang, Lingming
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/8/20
Y1 - 2021/8/20
N2 - Coverage-based fault localization has been extensively studied in the literature due to its effectiveness and lightweightness for real-world systems. However, existing techniques often utilize coverage in an oversimplified way by abstracting detailed coverage into numbers of tests or boolean vectors, thus limiting their effectiveness in practice. In this work, we present a novel coverage-based fault localization technique, GRACE, which fully utilizes detailed coverage information with graph-based representation learning. Our intuition is that coverage can be regarded as connective relationships between tests and program entities, which can be inherently and integrally represented by a graph structure: with tests and program entities as nodes, while with coverage and code structures as edges. Therefore, we first propose a novel graph-based representation to reserve all detailed coverage information and fine-grained code structures into one graph. Then we leverage Gated Graph Neural Network to learn valuable features from the graph-based coverage representation and rank program entities in a listwise way. Our evaluation on the widely used benchmark Defects4J (V1.2.0) shows that GRACE significantly outperforms state-of-the-art coverage-based fault localization: GRACE localizes 195 bugs within Top-1 whereas the best compared technique can at most localize 166 bugs within Top-1. We further investigate the impact of each GRACE component and find that they all positively contribute to GRACE. In addition, our results also demonstrate that GRACE has learnt essential features from coverage, which are complementary to various information used in existing learning-based fault localization. Finally, we evaluate GRACE in the cross-project prediction scenario on extra 226 bugs from Defects4J (V2.0.0), and find that GRACE consistently outperforms state-of-the-art coverage-based techniques.
AB - Coverage-based fault localization has been extensively studied in the literature due to its effectiveness and lightweightness for real-world systems. However, existing techniques often utilize coverage in an oversimplified way by abstracting detailed coverage into numbers of tests or boolean vectors, thus limiting their effectiveness in practice. In this work, we present a novel coverage-based fault localization technique, GRACE, which fully utilizes detailed coverage information with graph-based representation learning. Our intuition is that coverage can be regarded as connective relationships between tests and program entities, which can be inherently and integrally represented by a graph structure: with tests and program entities as nodes, while with coverage and code structures as edges. Therefore, we first propose a novel graph-based representation to reserve all detailed coverage information and fine-grained code structures into one graph. Then we leverage Gated Graph Neural Network to learn valuable features from the graph-based coverage representation and rank program entities in a listwise way. Our evaluation on the widely used benchmark Defects4J (V1.2.0) shows that GRACE significantly outperforms state-of-the-art coverage-based fault localization: GRACE localizes 195 bugs within Top-1 whereas the best compared technique can at most localize 166 bugs within Top-1. We further investigate the impact of each GRACE component and find that they all positively contribute to GRACE. In addition, our results also demonstrate that GRACE has learnt essential features from coverage, which are complementary to various information used in existing learning-based fault localization. Finally, we evaluate GRACE in the cross-project prediction scenario on extra 226 bugs from Defects4J (V2.0.0), and find that GRACE consistently outperforms state-of-the-art coverage-based techniques.
KW - Fault Localization
KW - Graph Neural Network
KW - Representation Learning
UR - http://www.scopus.com/inward/record.url?scp=85116236384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116236384&partnerID=8YFLogxK
U2 - 10.1145/3468264.3468580
DO - 10.1145/3468264.3468580
M3 - Conference contribution
AN - SCOPUS:85116236384
T3 - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 664
EP - 676
BT - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Spinellis, Diomidis
PB - Association for Computing Machinery
T2 - 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Y2 - 23 August 2021 through 28 August 2021
ER -