TY - GEN
T1 - Bug Localization via Supervised Topic Modeling
AU - Wang, Yaojing
AU - Yao, Yuan
AU - Tong, Hanghang
AU - Huo, Xuan
AU - Li, Min
AU - Xu, Feng
AU - Lu, Jian
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.
AB - Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.
KW - Bug localization
KW - Bug report
KW - Supervised topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85061393845&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061393845&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2018.00076
DO - 10.1109/ICDM.2018.00076
M3 - Conference contribution
AN - SCOPUS:85061393845
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 607
EP - 616
BT - 2018 IEEE International Conference on Data Mining, ICDM 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Conference on Data Mining, ICDM 2018
Y2 - 17 November 2018 through 20 November 2018
ER -