Image Recognition Using Text and Audio Translation for the Visually Challenged

Main Article Content

Rishita Khurana, Preeti Manani, Nripendra Narayan Das, Manika, Madhulika, Ashish Grover, Richa Adlakha

Abstract

WHO has expressed that out of the general populace on the planet there are 253 million individuals are outwardly impeded around the world. It comes to the standpoint that visually impaired individuals are finding burdensome to curve out their ordinary life. It is vital for take significant measure with the current innovations so they can experience the ongoing scene with next to no troubles. To lift the visually impaired people in the public, this project has been proposed, which can identify images and translates the description of image into text and then produce the audio. This can assist the individual with perusing any text and recognize the image and get the result in vocal structure. Motivated by late work in machine interpretation also, object recognition, a CNN-RNN based attention model is presented in this project. Through the proposed framework, an image is converted into text description first; then, utilizing a basic text-to-speech API, the extracted caption/subtitle is converted into speech which further assists the visually impaired to understand the image or visuals they are looking at. So, the focal part is centered on building the subtitle/text model while the subsequent part, which is changing the text-to-speech, is moderately simple with the text-to-speech API. When the model is fabricated, it is deployed on the local framework utilizing a Flask-based model to produce audio-based caption for any image fed to the model.

Article Details

How to Cite
Rishita Khurana, et al. (2023). Image Recognition Using Text and Audio Translation for the Visually Challenged. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 2164–2181. https://doi.org/10.17762/ijritcc.v11i10.8904
Section
Articles
Author Biography

Rishita Khurana, Preeti Manani, Nripendra Narayan Das, Manika, Madhulika, Ashish Grover, Richa Adlakha

Rishita Khurana1, Preeti Manani2, Nripendra Narayan Das3, Manika4, Madhulika5, Ashish Grover6, Richa Adlakha7

1Department of Computer Science and Engineering , Amity University, Noida,India

rishitaakhurana14@gmail.com

2Faculty of Education ,Dayalbagh Educational institute, (deemed to be university), Agra

preetimanani.1708@gmail.com

Department of Information Technology

3Corresponding Author, Department of Information Technology , Manipal University Jaipur, Rajasthan, India

nripendradas@gmail.com

4Department of Computer Science and Engineering , Amity University, Noida,India

manikachoudhary58@gmail.com

5Department of Computer Science and Engineering , Amity University, Noida,India

drmadhulikabhatia@gmail.com

6Department of Electrical and Electronics Engineering, MRIIRS,Faridabad

Ashi.21s@gmail.com

7Department of Electrical and Electronics Engineering, MRIIRS,Faridabad

Richaadlakaha.fet@mriu.edu.in