[1]
et al. Himanshu Tyagi, “TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment”, IJRITCC, vol. 11, no. 9, pp. 4851–4857, Nov. 2023.