Evaluating Text-to-Image GANs Performance: A Comparative Analysis of Evaluation Metrics

Main Article Content

K. Dinesh Kumar
Sarot Srang
Dona Valy

Abstract

Generative Adversarial Networks (GANs) have emerged as powerful techniques for generating high-quality images in various domains but assessing how realistic the generated images are is a challenging task. To address this issue, researchers have proposed a variety of evaluation metrics for GANs, each with its own strengths and limitations. This paper presents a comprehensive analysis of popular GAN evaluation metrics, including FID, Mode Score, Inception Score, MMD, PSNR, and SSIM. The strengths, weaknesses, and calculation processes of these metrics are discussed, focusing on assessing image fidelity and diversity. Two approaches, pixel distance, and feature distance, are employed to measure image similarity, while the importance of evaluating individual objects using input captions is emphasized. Experimental results on a basic GAN trained on the MNIST dataset demonstrate improvement in various metrics across different epochs. The FID score decreases from 497.54594 at Epoch 0 to 136.91156 at Epoch 100, indicating improved differentiation between real and generated images. In addition, the Inception Score increases from 1.1533 to 1.6408, reflecting enhanced image quality and diversity. These findings highlight the effectiveness of the GAN model in generating more realistic and diverse images with training progression.  However, when it comes to evaluating GANs on complex datasets, challenges arise, highlighting the need to combine evaluation metrics with visual inspection and subjective measures of image quality. By adopting a comprehensive evaluation approach, researchers can gain a deeper understanding of GAN performance and guide the development of advanced models.

Article Details

How to Cite
Kumar, K. D. ., Srang, S. ., & Valy, D. . (2023). Evaluating Text-to-Image GANs Performance: A Comparative Analysis of Evaluation Metrics. International Journal on Recent and Innovation Trends in Computing and Communication, 11(8s), 618–627. https://doi.org/10.17762/ijritcc.v11i8s.7248
Section
Articles

References

K. D. Kumar, S. Srang and D. Valy, "A Review of Generative Adversarial Networks (GANs) for Technology-Assisted Learning: Solving Teaching and Learning Challenges," 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), Pudukkottai, India, 2022, pp. 820-826, doi: 10.1109/ICACRS55517.2022.10029021

A simple explanation of the Inception Score | by David Mack | Octavian | Medium

Yu, Yu & Zhang, Weibin & Deng, Yun. (2021). Frechet Inception Distance (FID) for Evaluating GANs.(PDF) Frechet Inception Distance (FID) for Evaluating GANs (researchgate.net).

Tim Salimans, Ian Goodfellow, Wojciech Zaremba , Vicki Cheung , Alec Radford , Xi Chen , “Improved Techniques for Training GANs”, [1606.03498] Improved Techniques for Training GANs (arxiv.org), 2016.

How to Implement the Inception Score (IS) for Evaluating GANs - MachineLearningMastery.com

Qiantong Xu, Gao Huang, Yang Yuan, Chuan Gu, Yu Sun, Felix Wu, Kilian Q. Weinberger , “An empirical study on evaluation metrics of generative adversarial networks”, 2018, 1806.07755.pdf (arxiv.org).

Weizhi Du , Shihao Tian , “Transformer and GAN Based Super-Resolution Reconstruction Network for Medical Images”, [2212.13068] Transformer and GAN Based Super-Resolution Reconstruction Network for Medical Images (arxiv.org), 2022.

Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh and Eero P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity ”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004.

All about Structural Similarity Index (SSIM): Theory + Code in PyTorch | by Pranjal Datta | SRM MIC | Medium.

Tobias Hinz, Stefan Heinrich, and Stefan Wermter,”Semantic object accuracy for generative text-to-image synthesis”. arXiv preprint arXiv:1910.13321, 2019.

Stanislav Frolova,b,? , Tobias Hinzc , Federico Raueb , J¨orn Heesb , Andreas Dengela,b, “Adversarial Text-to-Image Synthesis: A Review”, Adversarial Text-to-Image Synthesis: A Review (arxiv.org), 2021.

Zhang L, Zhang L, Mou X, Zhang D. FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process. 2011 Aug;20(8):2378-86. doi: 10.1109/TIP.2011.2109730. Epub 2011 Jan 31. PMID: 21292594.

Chaudhary, D. S. ., & Sivakumar, D. S. A. . (2022). Detection Of Postpartum Hemorrhaged Using Fuzzy Deep Learning Architecture . Research Journal of Computer Systems and Engineering, 3(1), 29–34. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/38

Leihao Wei, Yannan Lin, William Hsu, “USING A GENERATIVE ADVERSARIAL NETWORK FOR CT NORMALIZATION AND ITS IMPACT ON RADIOMIC FEATURES”, 2001.08741.pdf (arxiv.org) 2020.

W. Xue, L. Zhang, X. Mou and A. C. Bovik, "Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index," in IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 684-695, Feb. 2014, doi: 10.1109/TIP.2013.2293423.