TY - GEN
T1 - How good are the specs? A study of the bug-finding effectiveness of existing Java api specifications
AU - Legunsen, Owolabi
AU - Ul Hassan, Wajih
AU - Xu, Xinyue
AU - Roşu, Grigore
AU - Marinov, Darko
N1 - This research was partially sup-ported by the NSF Grants CCF-1421503, CCF-1421575, CCF-1438982, and CCF-1439957. Wajih Ul Hassan was par-tially supported by the Sohaib and Sara Abassi Fellowship.
PY - 2016/8/25
Y1 - 2016/8/25
N2 - Runtime verification can be used to find bugs early, during software development, by monitoring test executions against formal specifications (specs). The quality of runtime verification depends on the quality of the specs. While previous research has produced many specs for the Java API, manually or through automatic mining, there has been no large-scale study of their bug-finding effectiveness. We present the first in-depth study of the bug-finding effectiveness of previously proposed specs. We used JavaMOP to monitor 182 manually written and 17 automatically mined specs against more than 18K manually written and 2.1M automatically generated tests in 200 open-source projects. The average runtime overhead was under 4.3×. We inspected 652 violations of manually written specs and (randomly sampled) 200 violations of automatically mined specs. We reported 95 bugs, out of which developers already fixed 74. However, most violations, 82.81% of 652 and 97.89% of 200, were false alarms. Our empirical results show that (1) runtime verification technology has matured enough to incur tolerable runtime overhead during testing, and (2) the existing API specifications can find many bugs that developers are willing to fix; however, (3) the false alarm rates are worrisome and suggest that substantial effort needs to be spent on engineering better specs and properly evaluating their effectiveness.
AB - Runtime verification can be used to find bugs early, during software development, by monitoring test executions against formal specifications (specs). The quality of runtime verification depends on the quality of the specs. While previous research has produced many specs for the Java API, manually or through automatic mining, there has been no large-scale study of their bug-finding effectiveness. We present the first in-depth study of the bug-finding effectiveness of previously proposed specs. We used JavaMOP to monitor 182 manually written and 17 automatically mined specs against more than 18K manually written and 2.1M automatically generated tests in 200 open-source projects. The average runtime overhead was under 4.3×. We inspected 652 violations of manually written specs and (randomly sampled) 200 violations of automatically mined specs. We reported 95 bugs, out of which developers already fixed 74. However, most violations, 82.81% of 652 and 97.89% of 200, were false alarms. Our empirical results show that (1) runtime verification technology has matured enough to incur tolerable runtime overhead during testing, and (2) the existing API specifications can find many bugs that developers are willing to fix; however, (3) the false alarm rates are worrisome and suggest that substantial effort needs to be spent on engineering better specs and properly evaluating their effectiveness.
KW - Empirical study
KW - Runtime verification
KW - Specification quality
UR - https://www.scopus.com/pages/publications/84989172598
UR - https://www.scopus.com/pages/publications/84989172598#tab=citedBy
U2 - 10.1145/2970276.2970356
DO - 10.1145/2970276.2970356
M3 - Conference contribution
AN - SCOPUS:84989172598
T3 - ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
SP - 602
EP - 613
BT - ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
A2 - Khurshid, Sarfraz
A2 - Lo, David
A2 - Apel, Sven
PB - Association for Computing Machinery
T2 - 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016
Y2 - 3 September 2016 through 7 September 2016
ER -