## Abstract

This paper shows that two commonly used evaluation metrics for generative models, the Fréchet Inception Distance (FID) and the Inception Score (IS), are biased - the expected value of the score computed for a finite sample set is not the true value of the score. Worse, the paper shows that the bias term depends on the particular model being evaluated, so model A may get a better score than model B simply because model A's bias term is smaller. This effect cannot be fixed by evaluating at a fixed number of samples. This means all comparisons using FID or IS as currently computed are unreliable. We then show how to extrapolate the score to obtain an effectively bias-free estimate of scores computed with an infinite number of samples, which we term FID_{∞} and IS_{∞}. In turn, this effectively bias-free estimate requires good estimates of scores with a finite number of samples. We show that using Quasi-Monte Carlo integration notably improves estimates of FID and IS for finite sample sets. Our extrapolated scores are simple, drop-in replacements for the finite sample scores. Additionally, we show that using low discrepancy sequence in GAN training offers small improvements in the resulting generator. The code for calculating FID_{∞} and IS_{∞} is at https://github.com/ mchong6/FID_IS_infinity.

Original language | English (US) |
---|---|

Article number | 9156949 |

Pages (from-to) | 6069-6078 |

Number of pages | 10 |

Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |

DOIs | |

State | Published - 2020 |

Event | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: Jun 14 2020 → Jun 19 2020 |

## ASJC Scopus subject areas

- Software
- Computer Vision and Pattern Recognition