Deep Learning for Dense Interpretation of Video: Survey of Various Approach, Challenges, Datasets and Metrics

Main Article Content

Kiran P Kamble, Vijay R. Ghorpade

Abstract

Video interpretation has garnered considerable attention in computer vision and natural language processing fields due to the rapid expansion of video data and the increasing demand for various applications such as intelligent video search, automated video subtitling, and assistance for visually impaired individuals. However, video interpretation presents greater challenges due to the inclusion of both temporal and spatial information within the video. While deep learning models for images, text, and audio have made significant progress, efforts have recently been focused on developing deep networks for video interpretation. A thorough evaluation of current research is necessary to provide insights for future endeavors, considering the myriad techniques, datasets, features, and evaluation criteria available in the video domain. This study offers a survey of recent advancements in deep learning for dense video interpretation, addressing various datasets and the challenges they present, as well as key features in video interpretation. Additionally, it provides a comprehensive overview of the latest deep learning models in video interpretation, which have been instrumental in activity identification and video description or captioning. The paper compares the performance of several deep learning models in this field based on specific metrics. Finally, the study summarizes future trends and directions in video interpretation.

Article Details

How to Cite
Vijay R. Ghorpade, K. P. K. (2024). Deep Learning for Dense Interpretation of Video: Survey of Various Approach, Challenges, Datasets and Metrics. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 2812–2831. Retrieved from https://ijritcc.org/index.php/ijritcc/article/view/10312
Section
Articles