TY - GEN
T1 - An Empirical Comparison of Mutant Selection Assessment Metrics
AU - Zhang, Jie M.
AU - Zhang, Lingming
AU - Hao, Dan
AU - Zhang, Lu
AU - Harman, Mark
N1 - Funding Information:
This work was supported by National Key Research and Devel-opment Plan (2016YFA0203200) and the National Natural ScienceFoundation of China (Grant Nos. 51538013 and 51138009).
Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - Mutation testing is expensive due to the large number of mutants, a problem typically tackled using selective techniques, thereby raising the fundamental question of how to evaluate the selection process. Existing mutant selection approaches rely on one of two types of metrics (or assessment criteria), one based on adequate test sets and the other based on inadequate test sets. This raises the question as to whether these two metrics are correlated, complementary or substitutable for one another. The tester's faith in mutant selection as well as the validity of previous research work using only one metric rely on the answer to this question, yet it currently remains unanswered. To answer it, we perform qualitative and quantitative comparisons with 104 different projects, consisting of over 600,000 lines of code. Our results indicate a strong connection between the two types of metrics (R2=0.8622 on average). The strategy for dealing with equivalent mutants and test density is observed to have a negligible impact for mutant selection.
AB - Mutation testing is expensive due to the large number of mutants, a problem typically tackled using selective techniques, thereby raising the fundamental question of how to evaluate the selection process. Existing mutant selection approaches rely on one of two types of metrics (or assessment criteria), one based on adequate test sets and the other based on inadequate test sets. This raises the question as to whether these two metrics are correlated, complementary or substitutable for one another. The tester's faith in mutant selection as well as the validity of previous research work using only one metric rely on the answer to this question, yet it currently remains unanswered. To answer it, we perform qualitative and quantitative comparisons with 104 different projects, consisting of over 600,000 lines of code. Our results indicate a strong connection between the two types of metrics (R2=0.8622 on average). The strategy for dealing with equivalent mutants and test density is observed to have a negligible impact for mutant selection.
KW - Assessment metrics
KW - Mutant selection
KW - Mutation testing
UR - http://www.scopus.com/inward/record.url?scp=85068399377&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068399377&partnerID=8YFLogxK
U2 - 10.1109/ICSTW.2019.00037
DO - 10.1109/ICSTW.2019.00037
M3 - Conference contribution
AN - SCOPUS:85068399377
T3 - Proceedings - 2019 IEEE 12th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2019
SP - 90
EP - 101
BT - Proceedings - 2019 IEEE 12th International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2019
Y2 - 22 April 2019 through 27 April 2019
ER -