A Diaspora of Humans to Technology: VEDA Net for Sentiments and their Technical Analysis
Main Article Content
Abstract
Background: Human sentiments are the representation of one’s soul. Visual media has emerged as one of the most potent instruments for communicating thoughts and feelings in today's world. The area of visible emotion analysis is abstract due to the considerable amount of bias in the human cognitive process. Machines need to apprehend better and segment these for future AI advancements. A broad range of prior research has investigated only the emotion class identifier part of the whole process. In this work, we focus on proposing a better architecture to assess an emotion identifier and finding a better strategy to extract and process an input image for the architecture.
Objective: We investigate the subject of visual emotion detection and analysis using a connected Dense Blocked Network to propose an architecture VEDANet. We show that the proposed architecture performed extremely effectively across different datasets.
Method: Using CNN based pre-trained architectures, we would like to highlight the spatial hierarchies of visual features. Because the image's spatial regions communicate substantial feelings, we utilize a dense block-based model VEDANet that focuses on the image's relevant sentiment-rich regions for effective emotion extraction. This work makes a substantial addition by providing an in-depth investigation of the proposed architecture by carrying out extensive trials on popular benchmark datasets to assess accuracy gains over the comparable state-of-the-art. In terms of emotion detection, the outcomes of the study show that the proposed VED system outperforms the existing ones (accuracy). Further, we explore over the top optimization i.e. OTO layer to achieve higher efficiency.
Results: When compared to the recent past research works, the proposed model performs admirably and obtains accuracy of 87.30% on the AffectNet dataset, 92.76% on Google FEC, 95.23% on Yale Dataset, and 97.63% on FER2013 dataset. We successfully merged the model with a face detector to obtain 98.34 percent accuracy on Real-Time live frames, further encouraging real-time applications. In comparison to existing approaches, we achieve real-time performance with a minimum TAT (Turn-around-Time) trade-off by using an appropriate network size and fewer parameters.