FLEX: Fixing flaky tests in machine learning projects by updating assertion bounds

Saikat Dutta, August Shi, Sasa Misailovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many machine learning (ML) algorithms are inherently random-multiple executions using the same inputs may produce slightly different results each time. Randomness impacts how developers write tests that check for end-to-end quality of their implementations of these ML algorithms. In particular, selecting the proper thresholds for comparing obtained quality metrics with the reference results is a non-intuitive task, which may lead to flaky test executions. We present FLEX, the first tool for automatically fixing flaky tests due to algorithmic randomness in ML algorithms. FLEX fixes tests that use approximate assertions to compare actual and expected values that represent the quality of the outputs of ML algorithms. We present a technique for systematically identifying the acceptable bound between the actual and expected output quality that also minimizes flakiness. Our technique is based on the Peak Over Threshold method from statistical Extreme Value Theory, which estimates the tail distribution of the output values observed from several runs. Based on the tail distribution, FLEX updates the bound used in the test, or selects the number of test re-runs, based on a desired confidence level. We evaluate FLEX on a corpus of 35 tests collected from the latest versions of 21 ML projects. Overall, FLEX identifies and proposes a fix for 28 tests. We sent 19 pull requests, each fixing one test, to the developers. So far, 9 have been accepted by the developers.

Original languageEnglish (US)
Title of host publicationESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsDiomidis Spinellis
PublisherAssociation for Computing Machinery
Pages603-614
Number of pages12
ISBN (Electronic)9781450385626
DOIs
StatePublished - Aug 20 2021
Event29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece
Duration: Aug 23 2021Aug 28 2021

Publication series

NameESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/TerritoryGreece
CityVirtual, Online
Period8/23/218/28/21

Keywords

  • Extreme Value Theory
  • Flaky tests
  • Machine Learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'FLEX: Fixing flaky tests in machine learning projects by updating assertion bounds'. Together they form a unique fingerprint.

Cite this