Understanding First-Person and Third-Person Videos in Computer Vision

Main Article Content

Sheetal Girase
Mangesh Bedekar

Abstract

Due to advancements in technology and social media, a large amount of visual information is created. There is a lot of interesting research going on in Computer Vision that takes into consideration either visual information generated by first-person (egocentric) or third-person(exocentric) cameras. Video data generated by YouTubers, Surveillance cameras, and Drones which is referred to as third-person or exocentric video data. Whereas first-person or egocentric is the one which is generated by GoPro cameras and Google Glass. Exocentric view capture wide and global views whereas egocentric view capture activities an actor is involved in w.r.t. objects. These two perspectives seem to be independent yet related. In Computer Vision, these two perspectives have been studied by various domains like Activity Recognition, Object Detection, Action Recognition, and Summarization independently. Their relationship and comparison are less discussed in the literature. This paper tries to bridge this gap by presenting a systematic study of first-person and third-person videos. Further, we implemented an algorithm to classify videos as first-person/third-person with the validation accuracy of 88.4% and an F1-score of 86.10% using the Charades dataset..

Article Details

How to Cite
Girase, S. ., & Bedekar, M. . (2023). Understanding First-Person and Third-Person Videos in Computer Vision . International Journal on Recent and Innovation Trends in Computing and Communication, 11(9s), 263–271. https://doi.org/10.17762/ijritcc.v11i9s.7420
Section
Articles

References

S. A. Behrostaghi, “Relating First-person and Third-person Vision,” 2018.

H. A. Ghafoor, A. Javed, A. Irtaza, H. Dawood, H. Dawood, and A. Banjar, “Egocentric Video Summarization Based on People Interaction Using Deep Learning,” Math. Probl. Eng., vol. 2018, 2018, doi: 10.1155/2018/7586417.

A. Betancourt, P. Morerio, C. S. Regazzoni, and M. Rauterberg, “An Overview of First Person Vision and Egocentric Video Analysis for Personal Mobile Wearable Devices,” Circuits Syst. Video Technol., vol. (Under Rev, no. (Under Review), pp. 744–760, 2014.

M. Devyver, a Tsukada, and T. Kanade, “A wearable device for first person vision,” 3rd Int. Symp. Qual. Life …, pp. 1–6, 2011, [Online].Available: http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:A+Wearable+Device+for+First+Person+Vision#7.

G. Liu, H. Tang, H. Latapie, and Y. Yan, “Exocentric to Egocentric Image Generation Via Parallel Generative Adversarial Network,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2020-May, pp. 1843–1847, 2020, doi: 10.1109/ICASSP40776.2020.9053957.

C. Fan et al., “Identifying First-person Camera Wearers in Third-person Videos,” no. 1.

M. Xu, C. Fan, Y. Wang, M. S. Ryoo, and D. J. Crandall, “Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11205 LNCS, pp. 656–672, 2018, doi: 10.1007/978-3-030-01246-5_39.

Y. Li, T. Nagarajan, B. Xiong, and K. Grauman, “Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 6939–6949, 2021, doi: 10.1109/CVPR46437.2021.00687.

G. A. Sigurdsson, A. Gupta, C. Schmid, A. Farhadi, and K. Alahari, “Actor and Observer: Joint Modeling of First and Third-Person Videos,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 7396–7404, 2018, doi: 10.1109/CVPR.2018.00772.

“135 Video Marketing Statistics You Can’t Ignore in 2022.” https://invideo.io/blog/video-marketing-statistics/ (accessed Dec. 20, 2022).

A. Rathore, C. Arora, P. Nagar, and C. V. Jawahar, “Generating 1 minute summaries of day long egocentric videos,” MM 2019 - Proc. 27th ACM Int. Conf. Multimed., pp. 2305–2313, 2019, doi: 10.1145/3343031.3350880.

D. Das Dawn and S. H. Shaikh, “A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector,” Vis. Comput., vol. 32, no. 3, pp. 289–306, 2016, doi: 10.1007/s00371-015-1066-2.

Q. Li, R. Gravina, Y. Li, S. H. Alsamhi, F. Sun, and G. Fortino, “Multi-user activity recognition: Challenges and opportunities,” Inf. Fusion, vol. 63, pp. 121–135, 2020, doi: 10.1016/j.inffus.2020.06.004.

A. B. Sargano, P. Angelov, and Z. Habib, “A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition,” Appl. Sci., vol. 7, no. 1, 2017, doi: 10.3390/app7010110.

D. Surie, T. Pederson, F. Lagriffoul, L. E. Janlert, and D. Sjölie, “Activity recognition using an egocentric perspective of everyday objects,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4611 LNCS, pp. 246–257, 2007, doi: 10.1007/978-3-540-73549-6_25.

H. I. Ho, W. C. Chiu, and Y. C. F. Wang, “Summarizing first-person videos from third persons’ points of views,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11219 LNCS, pp. 72–89, 2018, doi: 10.1007/978-3-030-01267-0_5.

A. Betancourt, P. Morerio, C. S. Regazzoni, and M. Rauterberg, “The evolution of first person vision methods: A survey,” IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 5, pp. 744–760, 2015, doi: 10.1109/TCSVT.2015.2409731.

S. Bambach, “A Survey on Recent Advances of Computer Vision Algorithms for Egocentric Video,” 2015, [Online]. Available: http://arxiv.org/abs/1501.02825.

M. S. Ryoo, B. Rothrock, and L. Matthies, “Pooled motion features for first-person videos,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07-12-June, no. Figure 1, pp. 896–904, 2015, doi: 10.1109/CVPR.2015.7298691.

D. Thapar, C. Arora, and A. Nigam, “Is Sharing of Egocentric Video Giving Away Your Biometric Signature?,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12362 LNCS, pp. 399–416, 2020, doi: 10.1007/978-3-030-58520-4_24.

S. Narayan, M. S. Kankanhalli, and K. R. Ramakrishnan, “Action and interaction recognition in first-person videos,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 526–532, 2014, doi: 10.1109/CVPRW.2014.82.

I. Laptev and T. Lindeberg, “Space-time interest points,” Proc. IEEE Int. Conf. Comput. Vis., vol. 1, pp. 432–439, 2003, doi: 10.1109/iccv.2003.1238378.

H. Wang and C. Schmid, “Action recognition with improved trajectories,” Proc. IEEE Int. Conf. Comput. Vis., pp. 3551–3558, 2013, doi: 10.1109/ICCV.2013.441.

M. Li, H. Leung, and H. P. H. Shum, “Human action recognition via skeletal and depth based feature fusion,” Proc. - Motion Games 2016 9th Int. Conf. Motion Games, MIG 2016, pp. 123–132, 2016, doi: 10.1145/2994258.2994268.

A. Garcia, C. Tan, J. Lim, and A. Tan, “Summarization of Egocentric Videos?: A Comprehensive Survey,” no. section IV.

F. Martinez, A. Carbone, and E. Pissaloux, “Combining first-person and third-person gaze for attention recognition,” 2013 10th IEEE Int. Conf. Work. Autom. Face Gesture Recognition, FG 2013, 2013, doi: 10.1109/FG.2013.6553735.

M. Dimiccoli, Computer Vision for Egocentric (First-Person) Vision. Elsevier Ltd, 2018.

C. Tan, H. Goh, V. Chandrasekhar, L. Li, and J. H. Lim, “Understanding the nature of first-person videos: Characterization and classification using low-level features,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 549–556, 2014, doi: 10.1109/CVPRW.2014.85.

G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta, “Hollywood in homes: Crowdsourcing data collection for activity understanding,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 510–526, 2016, doi: 10.1007/978-3-319-46448-0_31.

L. Shu, H. Zhang, Y. You, Y. Cui, and W. Chen, “Towards fire prediction accuracy enhancements by leveraging an improved naïve bayes algorithm,” Symmetry (Basel)., vol. 13, no. 4, pp. 1–14, 2021, doi: 10.3390/sym13040530.