Detecting flaky tests in probabilistic and machine learning applications

Saikat Dutta, August Shi, Rutvik Choudhary, Zhekun Zhang, Aryaman Jain, Sasa Misailovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Probabilistic programming systems and machine learning frameworks like Pyro, PyMC3, TensorFlow, and PyTorch provide scalable and efficient primitives for inference and training. However, such operations are non-deterministic. Hence, it is challenging for developers to write tests for applications that depend on such frameworks, often resulting in flaky tests - tests which fail non-deterministically when run on the same version of code. In this paper, we conduct the first extensive study of flaky tests in this domain. In particular, we study the projects that depend on four frameworks: Pyro, PyMC3, TensorFlow-Probability, and PyTorch. We identify 75 bug reports/commits that deal with flaky tests, and we categorize the common causes and fixes for them. This study provides developers with useful insights on dealing with flaky tests in this domain. Motivated by our study, we develop a technique, FLASH, to systematically detect flaky tests due to assertions passing and failing in different runs on the same code. These assertions fail due to differences in the sequence of random numbers in different runs of the same test. FLASH exposes such failures, and our evaluation on 20 projects results in 11 previously-unknown flaky tests that we reported to developers.

Original languageEnglish (US)
Title of host publicationISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
EditorsSarfraz Khurshid, Corina S. Pasareanu
PublisherAssociation for Computing Machinery, Inc
Pages211-224
Number of pages14
ISBN (Electronic)9781450380089
DOIs
StatePublished - Jul 18 2020
Event29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2020 - Virtual, Online, United States
Duration: Jul 18 2020Jul 22 2020

Publication series

NameISSTA 2020 - Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

Conference

Conference29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2020
CountryUnited States
CityVirtual, Online
Period7/18/207/22/20

Keywords

  • Flaky tests
  • Machine Learning
  • Non-Determinism
  • Probabilistic Programming
  • Randomness

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Detecting flaky tests in probabilistic and machine learning applications'. Together they form a unique fingerprint.

Cite this