TY - GEN
T1 - Identifying security bug reports via text mining
T2 - 7th IEEE Working Conference on Mining Software Repositories, MSR 2010, Co-located with the 2010 ACM/IEEE International Conference on Software Engineering, ICSE 2010
AU - Gegick, Michael
AU - Rotella, Pete
AU - Xie, Tao
PY - 2010
Y1 - 2010
N2 - A bug-tracking system such as Bugzilla contains bug reports (BRs) collected from various sources such as development teams, testing teams, and end users. When bug reporters submit bug reports to a bug-tracking system, the bug reporters need to label the bug reports as security bug reports (SBRs) or not, to indicate whether the involved bugs are security problems. These SBRs generally deserve higher priority in bug fixing than not-security bug reports (NSBRs). However, in the bug-reporting process, bug reporters often mislabel SBRs as NSBRs partly due to lack of security domain knowledge. This mislabeling could cause serious damage to software-system stakeholders due to the induced delay of identifying and fixing the involved security bugs. To address this important issue, we developed a new approach that applies text mining on natural-language descriptions of BRs to train a statistical model on already manually-labeled BRs to identify SBRs that are manually-mislabeled as NSBRs. Security engineers can use the model to automate the classification of BRs from large bug databases to reduce the time that they spend on searching for SBRs. We evaluated the model's predictions on a large Cisco software system with over ten million source lines of code. Among a sample of BRs that Cisco bug reporters manually labeled as NSBRs in bug reporting, our model successfully classified a high percentage (78%) of the SBRs as verified by Cisco security engineers, and predicted their classification as SBRs with a probability of at least 0.98.
AB - A bug-tracking system such as Bugzilla contains bug reports (BRs) collected from various sources such as development teams, testing teams, and end users. When bug reporters submit bug reports to a bug-tracking system, the bug reporters need to label the bug reports as security bug reports (SBRs) or not, to indicate whether the involved bugs are security problems. These SBRs generally deserve higher priority in bug fixing than not-security bug reports (NSBRs). However, in the bug-reporting process, bug reporters often mislabel SBRs as NSBRs partly due to lack of security domain knowledge. This mislabeling could cause serious damage to software-system stakeholders due to the induced delay of identifying and fixing the involved security bugs. To address this important issue, we developed a new approach that applies text mining on natural-language descriptions of BRs to train a statistical model on already manually-labeled BRs to identify SBRs that are manually-mislabeled as NSBRs. Security engineers can use the model to automate the classification of BRs from large bug databases to reduce the time that they spend on searching for SBRs. We evaluated the model's predictions on a large Cisco software system with over ten million source lines of code. Among a sample of BRs that Cisco bug reporters manually labeled as NSBRs in bug reporting, our model successfully classified a high percentage (78%) of the SBRs as verified by Cisco security engineers, and predicted their classification as SBRs with a probability of at least 0.98.
UR - http://www.scopus.com/inward/record.url?scp=77953738151&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953738151&partnerID=8YFLogxK
U2 - 10.1109/MSR.2010.5463340
DO - 10.1109/MSR.2010.5463340
M3 - Conference contribution
AN - SCOPUS:77953738151
SN - 9781424468034
T3 - Proceedings - International Conference on Software Engineering
SP - 11
EP - 20
BT - Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, MSR 2010, Co-located with ICSE 2010
Y2 - 2 May 2010 through 3 May 2010
ER -