Main Article Content
Emotion is a key element in user-generated video. However, it is dif?cult to understand emotions conveyed in such videos due to the complex and unstructured nature of user-generated content and the sparsity of video frames expressing emotion. In this paper, for the ?rst time, we propose a technique for transferring knowledge from heterogeneous external sources, including image and textual data, to facilitate tasks in understanding video emotion recognition. After that the audio files are played based on the emotion in the video frames. Speci?cally, our framework (1) learns a video encoding from an auxiliary emotional image dataset in order to improve supervised video emotion recognition. A comprehensive set of experiments on multiple datasets demonstrate the effectiveness of our framework. At first the video is separated into number of frames. Then the separated frames are taken for the analyzing using Convolutional Neural Network (CNN). The process is used to analysis the video frames to identify the emotion that describe in the videos. After that the audio files are played based on the emotion in the video frames. As a result, after processing the frames through CNN layers it provides a sound clip according to each emotions.