torch-fidelity: High-fidelity performance metrics for generative models in PyTorch

torch-fidelity provides precise, efficient, and extensible implementations of the popular metrics for generative model evaluation, including:

  • Inception Score (ISC)

  • Fréchet Inception Distance (FID)

  • Kernel Inception Distance (KID)

  • Perceptual Path Length (PPL)

Precision: Unlike many other reimplementations, the values produced by torch-fidelity match reference implementations up to machine precision. This allows using torch-fidelity for reporting metrics in papers instead of scattered and slow reference implementations.

Efficiency: Feature sharing between different metrics saves recomputation time, and an additional caching level avoids recomputing features and statistics whenever possible. High efficiency allows using torch-fidelity in the training loop, for example at the end of every epoch.

Extensibility: Going beyond 2D image generation is easy due to high modularity and abstraction of the metrics from input data, models, and feature extractors. For example, one can swap out InceptionV3 feature extractor for a one accepting 3D scan volumes, such as used in MRI.

TLDR; fast and reliable GAN evaluation in PyTorch


Citation is recommended to reinforce the evaluation protocol in works relying on torch-fidelity. To ensure reproducibility, use the following BibTeX:

  author={Anton Obukhov and Maximilian Seitzer and Po-Wei Wu and Semen Zhydenko and Jonathan Kyl and Elvis Yu-Jing Lin},
  title={High-fidelity performance metrics for generative models in PyTorch},
  note={Version: 0.2.0, DOI: 10.5281/zenodo.3786540}