Effectively unbiased FID and inception score and where to find them

Min Jin Chong, David Forsyth

Research output: Contribution to journalConference articlepeer-review


This paper shows that two commonly used evaluation metrics for generative models, the Fréchet Inception Distance (FID) and the Inception Score (IS), are biased - the expected value of the score computed for a finite sample set is not the true value of the score. Worse, the paper shows that the bias term depends on the particular model being evaluated, so model A may get a better score than model B simply because model A's bias term is smaller. This effect cannot be fixed by evaluating at a fixed number of samples. This means all comparisons using FID or IS as currently computed are unreliable. We then show how to extrapolate the score to obtain an effectively bias-free estimate of scores computed with an infinite number of samples, which we term FID and IS. In turn, this effectively bias-free estimate requires good estimates of scores with a finite number of samples. We show that using Quasi-Monte Carlo integration notably improves estimates of FID and IS for finite sample sets. Our extrapolated scores are simple, drop-in replacements for the finite sample scores. Additionally, we show that using low discrepancy sequence in GAN training offers small improvements in the resulting generator. The code for calculating FID and IS is at https://github.com/ mchong6/FID_IS_infinity.

Original languageEnglish (US)
Article number9156949
Pages (from-to)6069-6078
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
StatePublished - 2020
Event2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States
Duration: Jun 14 2020Jun 19 2020

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Cite this