TY - JOUR
T1 - An Empirical Study of Boosting Spectrum-Based Fault Localization via PageRank
AU - Zhang, Mengshi
AU - Li, Yaoxian
AU - Li, Xia
AU - Chen, Lingchao
AU - Zhang, Yuqun
AU - Zhang, Lingming
AU - Khurshid, Sarfraz
N1 - Funding Information:
This work was supported by the Ministry of Science and Technology of China (Grant No. 2017YFC0804002), Shenz-hen Peacock Plan (Grant No.KQTD2016112514355531), Science and Technology Innovation Committee Foundation of Shenzhen (Grant No.ZDSYS201703031748284 and No. JCYJ20170817110848086), US National Science Foundation grants (CCF-1566589, CCF-1763906, CCF-1704790 and CCF-1718903), Amazon, Futurewei, and Samsung.
Publisher Copyright:
© 1976-2012 IEEE.
PY - 2021/6/1
Y1 - 2021/6/1
N2 - Manual debugging is notoriously tedious and time-consuming. Therefore, various automated fault localization techniques have been proposed to help with manual debugging. Among the existing fault localization techniques, spectrum-based fault localization (SBFL) is one of the most widely studied techniques due to being lightweight. The focus of the existing SBFL techniques is to consider how to differentiate program entities (i.e., one dimension in program spectra); indeed, this focus is aligned with the ultimate goal of finding the faulty lines of code. Our key insight is to enhance the existing SBFL techniques by additionally considering how to differentiate tests (i.e., the other dimension in program spectra), which, to the best of our knowledge, has not been studied in prior work. We present our basic approach, PRFL, a lightweight technique that boosts SBFL by differentiating tests using PageRank algorithm. Specifically, given the original program spectrum information, PRFL uses PageRank to recompute the spectrum by considering the contributions of different tests. Next, traditional SBFL techniques are applied on the recomputed spectrum to achieve more effective fault localization. On top of PRFL, we explore PRFL+ and PRFL_{\mathrm{MA}} MA , two variants which extend PRFL by optimizing its components and integrating Method-level Aggregation technique, respectively. Though being simple and lightweight, PRFL has been demonstrated to outperform state-of-the-art SBFL techniques significantly (e.g., ranking 39.2% / 82.3% more real/artificial faults at Top-1 compared with the most effective traditional SBFL technique) with low overhead (e.g., around 6 minutes average extra overhead on real faults) on 395 real faults from 6 Defects4J projects and 96925 artificial (i.e., mutation) faults from 240 GitHub projects. To further validate PRFL's effectiveness, we compare PRFL with multiple recent proposed fault localization techniques (e.g., Multric, Metallaxis and MBFL-hybrid-avg), and the experimental results show that PRFL outperforms them as well. Furthermore, we study the performance of PRFL_{\mathrm{MA}} MA , and the experimental results present it can locate 137 real faults (73.4% / 24.5% more compared with the most effective SBFL/PRFL technique) and 35058 artificial faults (159.6% / 28.1% more than SBFL/PRFL technique) at Top-1. At last, we study the generalizability of PRFL on another benchmark, Bugs.jar, and the result shows PRFL can help locate around 30 percent more faults at Top 1.
AB - Manual debugging is notoriously tedious and time-consuming. Therefore, various automated fault localization techniques have been proposed to help with manual debugging. Among the existing fault localization techniques, spectrum-based fault localization (SBFL) is one of the most widely studied techniques due to being lightweight. The focus of the existing SBFL techniques is to consider how to differentiate program entities (i.e., one dimension in program spectra); indeed, this focus is aligned with the ultimate goal of finding the faulty lines of code. Our key insight is to enhance the existing SBFL techniques by additionally considering how to differentiate tests (i.e., the other dimension in program spectra), which, to the best of our knowledge, has not been studied in prior work. We present our basic approach, PRFL, a lightweight technique that boosts SBFL by differentiating tests using PageRank algorithm. Specifically, given the original program spectrum information, PRFL uses PageRank to recompute the spectrum by considering the contributions of different tests. Next, traditional SBFL techniques are applied on the recomputed spectrum to achieve more effective fault localization. On top of PRFL, we explore PRFL+ and PRFL_{\mathrm{MA}} MA , two variants which extend PRFL by optimizing its components and integrating Method-level Aggregation technique, respectively. Though being simple and lightweight, PRFL has been demonstrated to outperform state-of-the-art SBFL techniques significantly (e.g., ranking 39.2% / 82.3% more real/artificial faults at Top-1 compared with the most effective traditional SBFL technique) with low overhead (e.g., around 6 minutes average extra overhead on real faults) on 395 real faults from 6 Defects4J projects and 96925 artificial (i.e., mutation) faults from 240 GitHub projects. To further validate PRFL's effectiveness, we compare PRFL with multiple recent proposed fault localization techniques (e.g., Multric, Metallaxis and MBFL-hybrid-avg), and the experimental results show that PRFL outperforms them as well. Furthermore, we study the performance of PRFL_{\mathrm{MA}} MA , and the experimental results present it can locate 137 real faults (73.4% / 24.5% more compared with the most effective SBFL/PRFL technique) and 35058 artificial faults (159.6% / 28.1% more than SBFL/PRFL technique) at Top-1. At last, we study the generalizability of PRFL on another benchmark, Bugs.jar, and the result shows PRFL can help locate around 30 percent more faults at Top 1.
KW - automated debugging
KW - PageRank analysis
KW - SBFL
KW - Software testing
KW - spectrum-based fault localization
UR - http://www.scopus.com/inward/record.url?scp=85110356795&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110356795&partnerID=8YFLogxK
U2 - 10.1109/TSE.2019.2911283
DO - 10.1109/TSE.2019.2911283
M3 - Article
AN - SCOPUS:85110356795
SN - 0098-5589
VL - 47
SP - 1089
EP - 1113
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 6
M1 - 8698881
ER -