Comparative Performance of Data Mining Techniques for Cyberbullying Detection of Arabic Social Media Text

Main Article Content

Omar Kamal Eldien Hussien
Amal Elsayed Aboutabl
Riham Mohamed Younis Haggag

Abstract

Cyberbullying has spread like a virus on social media platforms and is getting out of control. According to psychological studies on the subject, the victims are increasingly suffering, sometimes to the point of committing suicide among the victims. The issue of cyberbullying on social media is spreading around the world. Social media use is growing, and it can have useful and negative implications when you take into account how social media platforms are abused through different forms of cyberbullying. Although there is a lot of cyberbullying detection in English, there are few studies in the Arabic language. Data Mining techniques are often used to solve and detect this problem. In this study, different data mining algorithms were used to detect cyberbullying in Arabic texts.. Our study was conducted The Bullying datasets consisted of 26,000 comments written in Arabic and were collected from kaggle.com, the Cyber_2021 dataset consisted of 13,247 comments collected via github.com, and the Data 2022 dataset consisted of 47,224 comments collected via Instagram. Various extraction features CountVectorizer and Tf-Idf were used Accuracy, precision, recall, and the F1 score were used to evaluate classifier performance. In the study, Bagging Classifier achieve high results of Bullying dataset from Kaggle Accuracy 96.04, F1-Score 95.98, Recall 96.04, Precision 95.95, SVC model gave the highest results of  Cyber_2021 dataset from Github an Accuracy 98.49, F1-Score 98.49, Recall 98.49, Precision 98.50, while Data 2022 dataset from (Instagram) achieving an Accuracy of 77.51, F1-Score 76.60, Recall 77.51, and Precision 77.24. Were achieved for Tf-Idf Vectorizer. Tf-Idf  Vectorizer the best to all results than count Vectorizer .

Article Details

How to Cite
Hussien, O. K. E. ., Aboutabl, A. E. ., & Haggag, R. M. Y. . (2023). Comparative Performance of Data Mining Techniques for Cyberbullying Detection of Arabic Social Media Text. International Journal on Recent and Innovation Trends in Computing and Communication, 11(11s), 392–400. https://doi.org/10.17762/ijritcc.v11i11s.8167
Section
Articles

References

Bharati, M. & Ramageri. (2010). “data mining technique applications,” Indian Journal of Computer Science and Engineering, Vol. 1 No. 4, pp. 301-305.

Verma, J. Data Mining in Indian Railways (2021). A Survey to Analyze Applications of Data Mining. International Journal of Computer Applications, 975, 8887.

Baid, P., Gupta, A., & Chaplot, N. (2017). Sentiment analysis of movie reviews using machine learning techniques. International Journal of Computer Applications, 179(7), 45-49.

Abdulrahman, S. A., Khalifa, W., Roushdy, M., & Salem, A.-B. M., “Comparative study for computational intelligence algorithms for human identification. Computer Science Review,” 36, 100237, pp.1-11, 2020.

Ghosh, R., Nowal, S., & Manju, G. (2021). Social media cyberbullying detection using machine learning in bengali language. Int J Eng Res Technol.

Ali, R. T., & Kurdy, M. B. Cyberbullying Detection in Syrian Slang on Social Media by using Data Mining. (May 2021).

A. McCallum and K. Nigam, “A comparison of event models for naive bayes text classification”, in Workshop On Learning For Text Categorization, July 1998, pp. 41-48.

Prajakta Ingle, Ramya Joshi, Neha Kaulgud, Aarti Suryawanshi, Meghana Lokhande (2021); Cyber bullying monitoring system for Twitter; International Journal of Scientific and Research Publications (IJSRP) 11(4) (ISSN: 2250-3153),

DOchttp://dx.doi.org/10.29322/IJSRP.11.04.2021.p11273

R. M. Kowalski, S. P. Limber and P. W. Agatston, Cyberbullying: Bullying in the Digital Age, West Sussex: Wiley-Blackwell, 2012.

V. Orgeta, “Specificity of age di_erences in emotion regulation,” Aging and Mental Health, 13(6), 818–826, 2009, doi:10.1080/13607860902989661.

A. Weinstein, D. Dorani, R. Elhadif, Y. Bukovza, A. Yarmulnik, P. Dannon, “Internet addiction is associated with social anxiety in young adults,” Annals of Clinical Psychiatry, 27(1), 4–9, 2015.

Cheng, L., Li, J., Silva, Y. N., Hall, D. L., & Liu, H. (2019). XBully: Cyberbullying Detection within a Multi-Modal Context. 19, 339–347. https://doi.org/10.1145/3289600.3291037

A. Rajaraman, J. D. Ullman,(2011) “Data Mining,” Mining of Massive Datasets, 1–17, , doi:doi:10.1017/CBO9781139058452.002

Hadžiosmanovi?, D., Simionato, L., Bolzoni, D., Zambon, E., & Etalle, S. (2012). N-gram against the machine: On the feasibility of the n-gram network analysis for binary protocols. In Research in Attacks Intrusions, and Defenses: 15th International Symposium, RAID 2012, Amsterdam, The Netherlands, September 12-14, 2012. Proceedings 15 (pp. 354-373). Springer Berlin Heidelberg.

Rajamohana, S. P., Dharani, A., Anushree, P., Santhiya, B., & Umamaheswari, K. (2023). Machine learning techniques for healthcare applications: early autism detection using ensemble approach and breast cancer prediction using SMO and IBK. In Research Anthology on Medical Informatics in Breast and Cervical Cancer (pp. 386-402). IGI Global.

Yang, X. S. (2019). Introduction to algorithms for data mining and machine learning. Academic press.

Rufaida, S. I., Leu, J. S., Su, K. W., Haniz, A., & Takada, J. I. (2020). Construction of an indoor radio environment map using gradient boosting decision tree. Wireless Networks, 26, 6215-6236.

Haidar, B., Chamoun, M., & Serhrouchni, A. (2019, July). Arabic cyberbullying detection: Enhancing performance by using ensemble machine learning. In 2019 international conference on internet of things (ithings) and ieee green computing and communications (greencom) and ieee cyber, physical and social computing (cpscom) and ieee smart data (smartdata) (pp. 323-327). IEEE.

Unicef. (2020). Cyberbullying: What is it and how to stop it. Retrieved from unicef. org: https://www. unicef. org/end-violence/how-to-stop-cyberbullying.

Rachid, B. A., Azza, H., & Ghezala, H. H. B. (2020, July). Classification of cyberbullying text in Arabic. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.

Kurniawanda, M. R., & Tobing, F. A. T. (2022). Analysis Sentiment Cyberbullying In Instagram Comments with XGBoost Method. IJNMT (International Journal of New Media Technology), 9(1), 28-34.

Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N project report, Stanford, 1(12), 2009.

Parikh, R., & Movassate, M. (2009). Sentiment analysis of user-generated twitter updates using various classification techniques. CS224N final report, 118, 1-18.