CROD: Context Aware Role based Offensive Detection using NLP/ DL Approaches

Main Article Content

T Purnima
Ch Koteswara Rao

Abstract

With the increased use of social media many people misuse online platforms by uploading offensive content and sharing the same with vast audience. Here comes controlling of such offensive contents. In this work we concentrate on the issue of finding offensive text in social media. Existing offensive text detection systems treat weak pejoratives like ‘idiot‘ and extremely indecent pejoratives like ‘f***‘ as same as offensive irrespective of formal and informal contexts . In fact the weakly pejoratives in informal discussions among friends are casual and common which are not offensive but the same can be offensive when expressed in formal discussions. Crucial challenges to accomplish the task of role based offensive detection in text are i) considering the roles while classifying the text as offensive or not i) creating a contextual datasets including both formal and informal roles. To tackle the above mentioned challenges we develop deep neural network based model known as context aware role based offensive detection(CROD). We examine CROD on the manually created dataset that is collected from social networking sites. Results show that CROD gives better performance with RoBERTa with an accuracy of 94% while considering the context and role in data specifics.

Article Details

How to Cite
Purnima, T., & Rao, C. K. . (2023). CROD: Context Aware Role based Offensive Detection using NLP/ DL Approaches. International Journal on Recent and Innovation Trends in Computing and Communication, 11(1), 01–11. https://doi.org/10.17762/ijritcc.v11i1.5981
Section
Articles

References

”Razavi, Amir H., et al. ”Offensive language detection using multi-level classification.” Canadian Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, 2010. ”

”Bretschneider, Uwe, and Ralf Peters. ”Detecting offensive statements towards foreigners in social media.” Proceedings of the 50th Hawaii International Conference on System Sciences. 2017.”

”Kocon, Jan, et al. ”Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach.” Information Processing & Management 58.5 (2021): 102643.”Yadav, Shashank H., and Pratik M. Manwatkar.

”An approach for offensive text detection and prevention in Social Networks.” 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). IEEE, 2015.”

”Bisht, Akanksha, et al. ”Detection of hate speech and offensive language in twitter data using lstm model.” Recent trends in image and signal processing in computer vision. Springer, Singapore, 2020. 243-264.”

”Zampieri, Marcos, et al. ”Predicting the type and target of offensive posts in social media.” arXiv preprint arXiv:1902.09666 (2019).”

”Ozoh, P. A., M. O. Olayiwola, and A. A. Adigun. ”Identification and classification of toxic comments on social media using machine learning techniques.” International Journal of Research and Innovation in Applied Science (IJRIAS) (2019).”

”Pitsilis, Georgios K., Heri Ramampiaro, and Helge Langseth. ”Detecting offensive language in tweets using deep learning.” arXiv preprint arXiv:1801.04433 (2018).”

”Van Hee, Cynthia, et al. ”Automatic detection of cyberbullying in social media text.” PloS one 13.10 (2018): e0203794.”

”Pitsilis, Georgios K., Heri Ramampiaro, and Helge Langseth. ”Effective hate-speech detection in Twitter data using recurrent neural networks.” Applied Intelligence 48.12 (2018): 4730-4742. ”

” Wiedemann, Gregor, et al. ”Transfer learning from lda to bilstm-cnn for offensive language detection in twitter.” arXiv preprint arXiv:1811.02906 (2018).”

”Kocon, Jan, et al. ”Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach.” Information Processing & Management 58.5 (2021): 102643.”

”d’Sa, Ashwin Geet, Irina Illina, and Dominique Fohr. ”Classification of Hate Speech Using Deep Neural Networks.” Revue d’Information Scientifique & Technique 25.01 (2020).”

”Shang, Lanyu, et al. ”Aomd: An analogy-aware approach to offensive meme detection on social media.” Information Processing & Management 58.5 (2021): 102664.”

”Bisht, Akanksha, et al. ”Detection of hate speech and offensive language in twitter data using lstm model.” Recent trends in image and signal processing in computer vision. Springer, Singapore, 2020. 243-264.”

”Tesfaye, Surafel Getachew, and Kula Kakeba. ”Automated Amharic Hate Speech Posts and Comments Detection Model Using Recurrent Neural Network.” (2020).”

”El-Alami, Fatima-zahra, Said Ouatik El Alaoui, and Noureddine En Nahnahi. ”A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model.” Journal of King Saud University-Computer and Information Sciences 34.8 (2022): 6048-6056.”

”Sajid, Tauqeer, et al. ”Roman urdu multi-class offensive text detection using hybrid features and svm.” 2020 IEEE 23rd International Multitopic Conference (INMIC). IEEE, 2020.”

”Bestgen, Yves. ”A simple language-agnostic yet strong baseline system for hate speech and offensive content identification.” Forum for Information Retrieval Evaluation (Working Notes)(FIRE), CEUR-WS. org. 2021.”

”Ameur, Mohamed Seghir Hadj, and Hassina Aliane. ”AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News & Hate Speech Detection Dataset.” Procedia Computer Science 189 (2021): 232-241.”

”Raj, Mitushi, et al. ”An application to detect cyberbullying using machine learning and deep learning techniques.” SN computer science 3.5 (2022): 1-13.”

”Akram, Muhammad Hammad, and Khurram Shahzad. ”Violent Views Detection in Urdu Tweets.” 2021 15th International Conference on Open Source Systems and Technologies (ICOSST). IEEE, 2021.”

”Rana, Toqir A., et al. ”An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu.” Transactions on Asian and Low-Resource Language Information Processing 21.2 (2021): 1-16.”

”Rizwan, Hammad, Muhammad Haroon Shakeel, and Asim Karim. ”Hate-speech and offensive language detection in roman Urdu.” Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 2020.”

”Sai, Siva, and Yashvardhan Sharma. ”Towards offensive language identification for Dravidian languages.” Proceedings of the first workshop on speech and language technologies for Dravidian languages. 2021.”

”Vasantharajan, Charangan, and Uthayasanker Thayasivam. ”Towards offensive language identification for tamil code-mixed youtube comments and posts.” SN Computer Science 3.1 (2022): 1-13.”

”Wiedemann, Gregor, Seid Muhie Yimam, and Chris Biemann. ”UHH-LT at SemEval-2020 task 12: Fine-tuning of pre-trained transformer networks for offensive language detection.” arXiv preprint arXiv:2004.11493 (2020).”

”Mossie, Zewdie, and Jenq-Haur Wang. ”Vulnerable community identification using hate speech detection on social media.” Information Processing & Management 57.3 (2020): 102087.”

”Liu, Ping, Wen Li, and Liang Zou. ”NULI at SemEval-2019 Task 6: Transfer Learning for Offensive Language Detection using Bidirectional Transformers.” SemEval@ NAACL-HLT. 2019.”

”Sigurbergsson, Gudbjartur Ingi, and Leon Derczynski. ”Offensive language and hate speech detection for Danish.” arXiv preprint arXiv:1908.04531 (2019).”

”Desrul, Dhamir Raniah Kiasati, and Ade Romadhony. ”Abusive language detection on Indonesian online news comments.” 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 2019.”

”De Souza, Gabriel AraAºjo, and M ˜ A¡rjory Da Costa-Abreu. ”Automatic offensive language detection from Twitter data using machine learning and ˜ feature selection of metadata.” 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.”

”Pradhan, Rahul, et al. ”A review on offensive language detection.” Advances in Data and Information Sciences (2020): 433-439.”

” Febriana, Trisna, and Arif Budiarto. ”Twitter dataset for hate speech and cyberbullying detection in Indonesian language.” 2019 International Conference on Information Management and Technology (ICIMTech). Vol. 1. IEEE, 2019.”

” Chetty, Naganna, and Sreejith Alathur. ”Hate speech review in the context of online social networks.” Aggression and violent behavior 40 (2018): 108-118.”

”Al-Hassan, Areej, and Hmood Al-Dossari. ”Detection of hate speech in social networks: a survey on multilingual corpus.” 6th International Conference on Computer Science and Information Technology. Vol. 10. 2019.”

”Zhao, Yingjia, and Xin Tao. ”ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN.” Proceedings of the first workshop on speech and language technologies for dravidian languages. 2021.”

”Kogilavani, S. V., et al. ”Characterization and mechanical properties of offensive language taxonomy and detection techniques.” Materials Today: Proceedings (2021).”

”De Souza, Gabriel AraAºjo, and M ˜ A¡rjory Da Costa-Abreu. ”Automatic offensive language detection from Twitter data using machine learning and ˜ feature selection of metadata.” 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.”

”Garca-Diaz, J. A., Salud Maria Jimenez-Zafra, and Rafael Valencia-Garcia. ”Umuteam at meoffendes 2021: Ensemble learning for offensive language identification using linguistic features, fine-grained negation and transformers.” Proceedings of the Iberian Languages Evaluation Forum (Iber-LEF 2021), CEUR Workshop Proceedings. CEUR-WS. org. 2021.”

”Saitov, Kamil, and Leon Derczynski. ”Abusive Language Recognition in Russian.” Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing. 2021.”

”Wu, Liang, and Huan Liu. ”Tracing fake-news footprints: Characterizing social media messages by how they propagate.” Proceedings of the eleventh ACM international conference on Web Search and Data Mining. 2018.”

”Mridha, Muhammad F., et al. ”L-Boost: Identifying Offensive Texts From Social Media Post in Bengali.” Ieee Access 9 (2021): 164681-164699.”

”Qasim, Rukhma, et al. ”A fine-tuned BERT-based transfer learning approach for text classification.” Journal of healthcare engineering 2022 (2022).”

”Sherstinsky, Alex. ”Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network.” Physica D: Nonlinear Phenomena 404 (2020): 132306.”

”Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. ”Glove: Global vectors for word representation.” Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.”

”Sadiq, Saima, et al. ”Aggression detection through deep neural model on twitter.” Future Generation Computer Systems 114 (2021): 120-129.”

”Castorena, Carlos M., et al. ”Deep neural network for gender-based violence detection on Twitter messages.” Mathematics 9.8 (2021): 807.”

”Chen, Junyi, Shankai Yan, and Ka-Chun Wong. ”Verbal aggression detection on Twitter comments: convolutional neural network for short-text sentiment analysis.” Neural Computing and Applications 32.15 (2020): 10809-10818.”

”Elouali, Aya, Zakaria Elberrichi, and Nadia Elouali. ”Hate Speech Detection on Multilingual Twitter Using Convolutional Neural Networks.” Rev. d’Intelligence Artif. 34.1 (2020): 81-88.”

”Basile, Valerio, et al. ”Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter.” Proceedings of the 13th international workshop on semantic evaluation. 2019.”

”Duwairi, Rehab, Amena Hayajneh, and Muhannad Quwaider. ”A deep learning framework for automatic detection of hate speech embedded in Arabic tweets.” Arabian Journal for Science and Engineering 46.4 (2021): 4001-4014.”

”Petrolito, Ruggero, and Felice DellOrletta. ”Word embeddings in sentiment analysis.” Turin, Italy (2018).

”de Pelle, Rogers Prates, and Viviane P. Moreira. ”Offensive comments in the brazilian web: a dataset and baseline results.” Anais do VI Brazilian Workshop on Social Network Analysis and Mining. SBC, 2017.. 2007, pp. 57-64, doi:10.1109/SCIS.2007.357670.