Deep just-in-time defect prediction: How far are we?

Zhengran Zeng, Yuqun Zhang, Haotian Zhang, Lingming Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line-number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.

Original languageEnglish (US)
Title of host publicationISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
EditorsCristian Cadar, Xiangyu Zhang
PublisherAssociation for Computing Machinery
Pages427-438
Number of pages12
ISBN (Electronic)9781450384599
DOIs
StatePublished - Jul 11 2021
Event30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 - Virtual, Online, Denmark
Duration: Jul 11 2021Jul 17 2021

Publication series

NameISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Conference

Conference30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021
Country/TerritoryDenmark
CityVirtual, Online
Period7/11/217/17/21

Keywords

  • Deep Learning
  • Just-In-Time Prediction
  • Software Defect Prediction

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Deep just-in-time defect prediction: How far are we?'. Together they form a unique fingerprint.

Cite this