TY - JOUR
T1 - Bug characteristics in open source software
AU - Tan, Lin
AU - Liu, Chen
AU - Li, Zhenmin
AU - Wang, Xuanhui
AU - Zhou, Yuanyuan
AU - Zhai, Chengxiang
N1 - Acknowledgements We thank Luyang Wang and Yaoqiang Li for classifying some bug reports. We thank Shan Lu for the early discussion and feedback. The work is partially supported by the National Science and Engineering Research Council of Canada, the United States National Science Foundation, the United States Department of Energy, a Google gift grant, and an Intel gift grant.
PY - 2014/10/12
Y1 - 2014/10/12
N2 - To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects—the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions—root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.
AB - To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects—the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions—root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.
KW - Bug detection
KW - Empirical study
KW - Open source
KW - Software bug characteristics
KW - Software reliability
UR - https://www.scopus.com/pages/publications/84909994867
UR - https://www.scopus.com/inward/citedby.url?scp=84909994867&partnerID=8YFLogxK
U2 - 10.1007/s10664-013-9258-8
DO - 10.1007/s10664-013-9258-8
M3 - Article
AN - SCOPUS:84909994867
SN - 1382-3256
VL - 19
SP - 1665
EP - 1705
JO - Empirical Software Engineering
JF - Empirical Software Engineering
IS - 6
ER -