TY - GEN
T1 - Understanding reproducibility and characteristics of flaky tests through test reruns in java projects
AU - Lam, Wing
AU - Winter, Stefan
AU - Astorga, Angello
AU - Stodden, Victoria
AU - Marinov, Darko
N1 - Funding Information:
ACKNOWLEDGMENTS We thank Anjiang Wei for helping us debug some flaky tests, Jon Bell for extensive discussions about flaky tests, and Tianyin Xu for sharing Microsoft Azure credits. This work was partially supported by NSF grant nos. CCF-1763788 and OAC-1839010, GEM fellowship, and Supplemental Summer Block Grant (SSBG). We acknowledge support for research on flaky tests from Facebook and Google.
Publisher Copyright:
©2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - Flaky tests are tests that can non-deterministically pass and fail. They pose a major impediment to regression testing, because they provide an inconclusive assessment on whether recent code changes contain faults or not. Prior studies of flaky tests have proposed tools to detect flaky tests and identified various sources of flakiness in tests, e.g., order-dependent (OD) tests that deterministically fail for some order of tests in a test suite but deterministically pass for some other orders. Several of these studies have focused on OD tests. We focus on an important and under-explored source of flakiness in tests: non-order-dependent tests that can nondeterministically pass and fail even for the same order of tests. Instead of using specialized tools that aim to detect flaky tests, we run tests using the tool configured by the developers. Specifically, we perform our empirical evaluation on Java projects that rely on the Maven Surefire plugin to run tests. We re-execute each test suite 4000 times, potentially in different test-class orders, and we label tests as flaky if our runs have both pass and fail outcomes across these reruns. We obtain a dataset of 107 flaky tests and study various characteristics of these tests. We find that many tests previously called "non-order-dependent"actually do depend on the order and can fail with very different failure rates for different orders.
AB - Flaky tests are tests that can non-deterministically pass and fail. They pose a major impediment to regression testing, because they provide an inconclusive assessment on whether recent code changes contain faults or not. Prior studies of flaky tests have proposed tools to detect flaky tests and identified various sources of flakiness in tests, e.g., order-dependent (OD) tests that deterministically fail for some order of tests in a test suite but deterministically pass for some other orders. Several of these studies have focused on OD tests. We focus on an important and under-explored source of flakiness in tests: non-order-dependent tests that can nondeterministically pass and fail even for the same order of tests. Instead of using specialized tools that aim to detect flaky tests, we run tests using the tool configured by the developers. Specifically, we perform our empirical evaluation on Java projects that rely on the Maven Surefire plugin to run tests. We re-execute each test suite 4000 times, potentially in different test-class orders, and we label tests as flaky if our runs have both pass and fail outcomes across these reruns. We obtain a dataset of 107 flaky tests and study various characteristics of these tests. We find that many tests previously called "non-order-dependent"actually do depend on the order and can fail with very different failure rates for different orders.
KW - Flaky tests
KW - Regression testing
KW - Reproducibility
UR - http://www.scopus.com/inward/record.url?scp=85097342281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097342281&partnerID=8YFLogxK
U2 - 10.1109/ISSRE5003.2020.00045
DO - 10.1109/ISSRE5003.2020.00045
M3 - Conference contribution
AN - SCOPUS:85097342281
T3 - Proceedings - International Symposium on Software Reliability Engineering, ISSRE
SP - 403
EP - 413
BT - Proceedings - 2020 IEEE 31st International Symposium on Software Reliability Engineering, ISSRE 2020
A2 - Vieira, Marco
A2 - Madeira, Henrique
A2 - Antunes, Nuno
A2 - Zheng, Zheng
PB - IEEE Computer Society
T2 - 31st IEEE International Symposium on Software Reliability Engineering, ISSRE 2020
Y2 - 12 October 2020 through 15 October 2020
ER -