VTKG: A Vision Transformer Model with Integration of Knowledge Graph for Enhanced Image Captioning

et al. Yugandhara A. Thakare

doi:10.17762/ijritcc.v11i9.8981

PDF

Published: Oct 30, 2023

DOI: https://doi.org/10.17762/ijritcc.v11i9.8981

Keywords:

Knowledge graph, Transformer model, Vision Transformer, image captioning.

Yugandhara A. Thakare, K. H. Walse, Mohammad Atique

Abstract

The Transformer model has exhibited impressive results in machine translation tasks. In this research, we utilize the Transformer model to improve the performance of image captioning. In this paper, we tackle the image captioning task from a novel sequence-to-sequence perspective and present VTKG, a VisionTransformer model with integrated Knowledge Graph, a comprehensive Transformer network that substitutes the CNN in the encoder section with a convolution-free Transformer encoder. Subsequently, to enhance the generation of meaningful captions and address the issue of mispredictions, we introduce a novel approach to integrate common-sense knowledge extracted from a knowledge graph. This has significantly improved the overall adaptability of our captioning model. Through the amalgamation of the previously mentioned strategies, we attain exceptional performance on multiple established evaluation metrics, outperforming existing benchmarks. Experimental results demonstrate a 1.32%, 1.7%, 1.25%, 1.14%, 2.8% and 2.5% improvement in Blue-1, Bluu-2, Blue-4, Metor, Rough-L and CIDEr score respectively when compared to state-of-the-art methods.

How to Cite

Yugandhara A. Thakare, et al. (2023). VTKG: A Vision Transformer Model with Integration of Knowledge Graph for Enhanced Image Captioning . International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 889–896. https://doi.org/10.17762/ijritcc.v11i9.8981

Issue

Vol. 11 No. 9 (2023)

Section

Articles

Citation Indices	All	Since 2018
Citation	5854	3996
h-index	28	23
i10-index	119	72

Year	Rate
2019	12.6%
2018	18.3%
2017	16.9%
2016	18.8%
2015	22.9%
2014	28.9%
2013	26.1%

VTKG: A Vision Transformer Model with Integration of Knowledge Graph for Enhanced Image Captioning

Abstract

Similar Articles

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links:

Article Sidebar

Main Article Content

Abstract

Article Details

Similar Articles

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links: