TY - GEN
T1 - Probabilistic and Systematic Coverage of Consecutive Test-Method Pairs for Detecting Order-Dependent Flaky Tests
AU - Wei, Anjiang
AU - Yi, Pu
AU - Xie, Tao
AU - Marinov, Darko
AU - Lam, Wing
N1 - Funding Information:
We are grateful to Peter Taylor for a StackExchange post [39] that led us to the concept of Tuscan squares. We thank Dragan Stevanović, Wenyu Wang, and Zhengkai Wu for discussions about Tuscan squares and Reed Oei for comments on the paper draft. This work was partially supported by NSF grants CNS-1564274, CNS-1646305, CCF-1763788, and CCF-1816615. We also acknowledge support for research on flaky tests from Facebook and Google.
Publisher Copyright:
© The Author(s) 2021.
PY - 2021
Y1 - 2021
N2 - Software developers frequently check their code changes by running a set of tests against their code. Tests that can nondeterministi-cally pass or fail when run on the same code version are called flaky tests. These tests are a major problem because they can mislead developers to debug their recent code changes when the failures are unrelated to these changes. One prominent category of flaky tests is order-dependent (OD) tests, which can deterministically pass or fail depending on the order in which the set of tests are run. By detecting OD tests in advance, developers can fix these tests before they change their code. Due to the high cost required to explore all possible orders (n! permutations for n tests), prior work has developed tools that randomize orders to detect OD tests. Experiments have shown that randomization can detect many OD tests, and that most OD tests depend on just one other test to fail. However, there was no analysis of the probability that randomized orders detect OD tests. In this paper, we present the first such analysis and also present a simple change for sampling random test orders to increase the probability. We finally present a novel algorithm to systematically explore all consecutive pairs of tests, guaranteeing to detect all OD tests that depend on one other test, while running substantially fewer orders and tests than simply running all test pairs.
AB - Software developers frequently check their code changes by running a set of tests against their code. Tests that can nondeterministi-cally pass or fail when run on the same code version are called flaky tests. These tests are a major problem because they can mislead developers to debug their recent code changes when the failures are unrelated to these changes. One prominent category of flaky tests is order-dependent (OD) tests, which can deterministically pass or fail depending on the order in which the set of tests are run. By detecting OD tests in advance, developers can fix these tests before they change their code. Due to the high cost required to explore all possible orders (n! permutations for n tests), prior work has developed tools that randomize orders to detect OD tests. Experiments have shown that randomization can detect many OD tests, and that most OD tests depend on just one other test to fail. However, there was no analysis of the probability that randomized orders detect OD tests. In this paper, we present the first such analysis and also present a simple change for sampling random test orders to increase the probability. We finally present a novel algorithm to systematically explore all consecutive pairs of tests, guaranteeing to detect all OD tests that depend on one other test, while running substantially fewer orders and tests than simply running all test pairs.
KW - Flaky tests
KW - Order dependent
KW - Test-pair coverage
UR - http://www.scopus.com/inward/record.url?scp=85150180140&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150180140&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-72016-2 15
DO - 10.1007/978-3-030-72016-2 15
M3 - Conference contribution
AN - SCOPUS:85150180140
SN - 9783030720155
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 270
EP - 287
BT - Tools and Algorithms for the Construction and Analysis of Systems - 27th International Conference, TACAS 2021 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021
A2 - Groote, Jan Friso
A2 - Larsen, Kim Guldstrand
PB - Springer
T2 - 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021 Held as Part of 24th European Joint Conferences on Theory and Practice of Software, ETAPS 2021
Y2 - 27 March 2021 through 1 April 2021
ER -