Bug Localization via Supervised Topic Modeling

Yaojing Wang, Yuan Yao, Hanghang Tong, Xuan Huo, Min Li, Feng Xu, Jian Lu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Data Mining, ICDM 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages607-616
Number of pages10
ISBN (Electronic)9781538691588
DOIs
StatePublished - Dec 27 2018
Externally publishedYes
Event18th IEEE International Conference on Data Mining, ICDM 2018 - Singapore, Singapore
Duration: Nov 17 2018Nov 20 2018

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2018-November
ISSN (Print)1550-4786

Conference

Conference18th IEEE International Conference on Data Mining, ICDM 2018
Country/TerritorySingapore
CitySingapore
Period11/17/1811/20/18

Keywords

  • Bug localization
  • Bug report
  • Supervised topic modeling

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Bug Localization via Supervised Topic Modeling'. Together they form a unique fingerprint.

Cite this