Bug characteristics in open source software

Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang, Yuanyuan Zhou, Chengxiang Zhai

Research output: Contribution to journalArticlepeer-review


To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects—the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions—root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.

Original languageEnglish (US)
Pages (from-to)1665-1705
Number of pages41
JournalEmpirical Software Engineering
Issue number6
StatePublished - Oct 12 2014


  • Bug detection
  • Empirical study
  • Open source
  • Software bug characteristics
  • Software reliability

ASJC Scopus subject areas

  • Software


Dive into the research topics of 'Bug characteristics in open source software'. Together they form a unique fingerprint.

Cite this