TY - GEN
T1 - Mitigating the effects of flaky tests on mutation testing
AU - Shi, August
AU - Bell, Jonathan
AU - Marinov, Darko
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/7/10
Y1 - 2019/7/10
N2 - Mutation testing is widely used in research as a metric for evaluating the quality of test suites. Mutation testing runs the test suite on generated mutants (variants of the code under test) where a test suite kills a mutant if any of the tests fail when run on the mutant. Mutation testing implicitly assumes that tests exhibit deterministic behavior, in terms of their coverage and the outcome of a test (not) killing a certain mutant. Such an assumption does not hold in the presence of flaky tests, whose outcomes can non-deterministically differ even when run on the same code under test. Without reliable test outcomes, mutation testing can result in unreliable results, e.g., in our experiments, mutation scores vary by four percentage points on average between repeated executions, and 9% of mutant-test pairs have an unknown status. Many modern software projects suffer from flaky tests. We propose techniques that manage flakiness throughout the mutation testing process, largely based on strategically re-running tests. We implement our techniques by modifying the open-source mutation testing tool, PIT. Our evaluation on 30 projects shows that our techniques reduce the number of łunknownž (flaky) mutants by 79.4%.
AB - Mutation testing is widely used in research as a metric for evaluating the quality of test suites. Mutation testing runs the test suite on generated mutants (variants of the code under test) where a test suite kills a mutant if any of the tests fail when run on the mutant. Mutation testing implicitly assumes that tests exhibit deterministic behavior, in terms of their coverage and the outcome of a test (not) killing a certain mutant. Such an assumption does not hold in the presence of flaky tests, whose outcomes can non-deterministically differ even when run on the same code under test. Without reliable test outcomes, mutation testing can result in unreliable results, e.g., in our experiments, mutation scores vary by four percentage points on average between repeated executions, and 9% of mutant-test pairs have an unknown status. Many modern software projects suffer from flaky tests. We propose techniques that manage flakiness throughout the mutation testing process, largely based on strategically re-running tests. We implement our techniques by modifying the open-source mutation testing tool, PIT. Our evaluation on 30 projects shows that our techniques reduce the number of łunknownž (flaky) mutants by 79.4%.
KW - Flaky tests
KW - Mutation testing
KW - Non-deterministic coverage
UR - http://www.scopus.com/inward/record.url?scp=85070631430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070631430&partnerID=8YFLogxK
U2 - 10.1145/3293882.3330568
DO - 10.1145/3293882.3330568
M3 - Conference contribution
AN - SCOPUS:85070631430
T3 - ISSTA 2019 - Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis
SP - 296
EP - 306
BT - ISSTA 2019 - Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis
A2 - Zhang, Dongmei
A2 - Moller, Anders
PB - Association for Computing Machinery
T2 - 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019
Y2 - 15 July 2019 through 19 July 2019
ER -