TERA: Optimizing stochastic regression tests in machine learning projects

Saikat Dutta, Jeeva Selvam, Aryaman Jain, Sasa Misailovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The stochastic nature of many Machine Learning (ML) algorithms makes testing of ML tools and libraries challenging. ML algorithms allow a developer to control their accuracy and run-time through a set of hyper-parameters, which are typically manually selected in tests. This choice is often too conservative and leads to slow test executions, thereby increasing the cost of regression testing. We propose TERA, the first automated technique for reducing the cost of regression testing in Machine Learning tools and libraries(jointly referred to as projects) without making the tests more flaky. TERA solves the problem of exploring the trade-off space between execution time of the test and its flakiness as an instance of Stochastic Optimization over the space of algorithm hyper-parameters. TERA presents how to leverage statistical convergence-testing techniques to estimate the level of flakiness of the test for a specific choice of hyper-parameters during optimization. We evaluate TERA on a corpus of 160 tests selected from 15 popular machine learning projects. Overall, TERA obtains a geo-mean speedup of 2.23x over the original tests, for the minimum passing probability threshold of 99%. We also show that the new tests did not reduce fault detection ability through a mutation study and a study on a set of 12 historical build failures in studied projects.

Original languageEnglish (US)
Title of host publicationISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
EditorsCristian Cadar, Xiangyu Zhang
PublisherAssociation for Computing Machinery
Pages413-426
Number of pages14
ISBN (Electronic)9781450384599
DOIs
StatePublished - Jul 11 2021
Event30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 - Virtual, Online, Denmark
Duration: Jul 11 2021Jul 17 2021

Publication series

NameISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Conference

Conference30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021
Country/TerritoryDenmark
CityVirtual, Online
Period7/11/217/17/21

Keywords

  • Bayesian Optimization
  • Machine Learning
  • Software Testing
  • Test Optimization

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'TERA: Optimizing stochastic regression tests in machine learning projects'. Together they form a unique fingerprint.

Cite this