TY - GEN
T1 - Revisiting Test-Case Prioritization on Long-Running Test Suites
AU - Cheng, Runxiang
AU - Wang, Shuai
AU - Jabbarvand, Reyhaneh
AU - Marinov, Darko
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/9/11
Y1 - 2024/9/11
N2 - The prolonged continuous integration (CI) runs are affecting timely feedback to software developers. Test-case prioritization (TCP) aims to expose faults sooner by reordering tests such that the ones more likely to fail are run earlier. TCP is thus especially important for long-running test suites. While many studies have explored TCP, they are based on outdated CI builds from over 10 years ago with test suites that last several minutes, or builds from inaccessible, proprietary projects. In this paper, we present LRTS, the first dataset of long-running test suites, with 21,255 CI builds and 57,437 test-suite runs from 10 large-scale, open-source projects that use Jenkins CI. LRTS spans from 2020 to 2023, with an average test-suite run duration of 6.5 hours. On LRTS, we study the effectiveness of 59 leading TCP techniques, the impact of confounding test failures on TCP, and TCP for failing tests with no prior failures. We revisit prior key findings (9 confirmed, 2 refuted) and establish 3 new findings. Our results show that prioritizing faster tests that recently failed performs the best, outperforming the sophisticated techniques.
AB - The prolonged continuous integration (CI) runs are affecting timely feedback to software developers. Test-case prioritization (TCP) aims to expose faults sooner by reordering tests such that the ones more likely to fail are run earlier. TCP is thus especially important for long-running test suites. While many studies have explored TCP, they are based on outdated CI builds from over 10 years ago with test suites that last several minutes, or builds from inaccessible, proprietary projects. In this paper, we present LRTS, the first dataset of long-running test suites, with 21,255 CI builds and 57,437 test-suite runs from 10 large-scale, open-source projects that use Jenkins CI. LRTS spans from 2020 to 2023, with an average test-suite run duration of 6.5 hours. On LRTS, we study the effectiveness of 59 leading TCP techniques, the impact of confounding test failures on TCP, and TCP for failing tests with no prior failures. We revisit prior key findings (9 confirmed, 2 refuted) and establish 3 new findings. Our results show that prioritizing faster tests that recently failed performs the best, outperforming the sophisticated techniques.
KW - regression testing
KW - reliability
KW - Software testing
KW - test prioritization
UR - http://www.scopus.com/inward/record.url?scp=85205581566&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205581566&partnerID=8YFLogxK
U2 - 10.1145/3650212.3680307
DO - 10.1145/3650212.3680307
M3 - Conference contribution
AN - SCOPUS:85205581566
T3 - ISSTA 2024 - Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
SP - 615
EP - 627
BT - ISSTA 2024 - Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
A2 - Christakis, Maria
A2 - Pradel, Michael
PB - Association for Computing Machinery
T2 - 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024
Y2 - 16 September 2024 through 20 September 2024
ER -