Phishing Detection using Base Classifier and Ensemble Technique

Main Article Content

Mithilesh Kumar Pandey
Rekha Pal
Saurabh Pal
Arvind Kumar Shukla
Manish Ranjan Pandey
Shantanu Shahi

Abstract

Phishing attacks continue to pose a significant threat in today's digital landscape, with both individuals and organizations falling victim to these attacks on a regular basis. One of the primary methods used to carry out phishing attacks is through the use of phishing websites, which are designed to look like legitimate sites in order to trick users into giving away their personal information, including sensitive data such as credit card details and passwords. This research paper proposes a model that utilizes several benchmark classifiers, including LR, Bagging, RF, K-NN, DT, SVM, and Adaboost, to accurately identify and classify phishing websites based on accuracy, precision, recall, f1-score, and confusion matrix. Additionally, a meta-learner and stacking model were combined to identify phishing websites in existing systems. The proposed ensemble learning approach using stack-based meta-learners proved to be highly effective in identifying both legitimate and phishing websites, achieving an accuracy rate of up to 97.19%, with precision, recall, and f1 scores of 97%, 98%, and 98%, respectively. Thus, it is recommended that ensemble learning, particularly with stacking and its meta-learner variations, be implemented to detect and prevent phishing attacks and other digital cyber threats.

Article Details

How to Cite
Pandey, M. K. ., Pal, R. ., Pal, S. ., Shukla, A. K. ., Pandey, M. R. ., & Shahi, S. . (2023). Phishing Detection using Base Classifier and Ensemble Technique. International Journal on Recent and Innovation Trends in Computing and Communication, 11(11s), 367–376. https://doi.org/10.17762/ijritcc.v11i11s.8164
Section
Articles

References

Arribas-Bel, D. (2014). Accidental, open and everywhere: Emerging data sources for the understanding of cities. Applied Geography, 49, 45-53.

Thabit, F., Alhomdy, S. A. H., Alahdal, A., & Jagtap, S. B. (2020). Exploration of Security Challenges in Cloud Computing: Issues, Threats, and Attacks with their Alleviating Techniques. Journal of Information and Computational Science, 12(10).

Auerbach, S. (2008). Screening out cyberbullies: Remedies for victims on the internet playground. Cardozo L. Rev., 30, 1641.

Karsten, P., & Bateman, O. (2016). Detecting Good Public Policy Rationales for the American Rule: A Response to the Ill-Conceived Calls for Loser Pays Rules. Duke LJ, 66, 729.

Sountharrajan, S., Nivashini, M., Shandilya, S. K., Suganya, E., Bazila Banu, A., & Karthiga, M. (2020). Dynamic recognition of phishing URLs using deep learning techniques. In Advances in cyber security analytics and decision systems (pp. 27-56). Springer, Cham.

Mourtaji, Y., Bouhorma, M., Alghazzawi, D., Aldabbagh, G., & Alghamdi, A. (2021). Hybrid rule-based solution for phishing URL detection using convolutional neural network. Wireless Communications and Mobile Computing, 2021.

Salloum, S., Gaber, T., Vadera, S., & Shaalan, K. (2021). Phishing email detection using natural language processing techniques: a literature survey. Procedia Computer Science, 189, 19-28.

Ardalani, H., Vidkjær, N. H., Kryger, P., Fiehn, O., & Fomsgaard, I. S. (2021). Metabolomics unveils the influence of dietary phytochemicals on residual pesticide concentrations in honey bees. Environment International, 152, 106503.

Liu, C., Wei, H., Qiu, T., & Zhu, X. (2018). A novel web attack detection system for Internet of Things via ensemble classification. IEEE Access, 6, 64594-64606.

Li, Y., Zhang, S., Chen, Y., & Chen, J. (2021). An edge computing based anomaly detection method in IoT industrial sustainability. IEEE Transactions on Industrial Informatics, 17(3), 2053-2062.

Wang, L., Wang, C., Cai, Z., Zhang, J., & Chen, W. (2019). Location privacy challenge in mobile edge computing. IEEE Network, 33(6), 52-58.

Munezero, M., Crespi, N., & Zeadally, S. (2020). Data mining and machine learning methods for sustainable smart cities traffic classification: A survey. Sustainable Cities and Society, 53, 101973.

Antonopoulos, I., Robu, V., Couraud, B., Kirli, D., Norbu, S., Kiprakis, A., ... & Wattam, S. (2020). Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review. Renewable and Sustainable Energy Reviews, 130, 109899.

Aburrous, M., & Khelifi, A. (2013, March). Phishing detection plug-in toolbar using intelligent Fuzzy-classification mining techniques. In The international conference on soft computing and software engineering [SCSE’13], San Francisco State University, San Francisco, California, USA.

Prabha , G. ., Mohan, A. ., Kumar, R. D. ., & Velrajkumar, G. . (2023). Computational Analogies of Polyvinyl Alcohol Fibres Processed Intellgent Systems with Ferrocement Slabs. International Journal of Intelligent Systems and Applications in Engineering, 11(4s), 313–321. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2669.

Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), 443-458. doi: 10.1007/s00521-013-1491-8

Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based associative classification data mining. Expert Systems with Applications, 41(13), 5948-5959.

Verma, R., & Das, A. (2017, March). What's in a url: Fast feature extraction and malicious url detection. In Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics (IWSPA) (pp. 55-63). ACM.

Khadi, A., & Shinde, S. (2014). Detection of phishing websites using data mining techniques. International Journal of Engineering Research and Technology, 2(12), 3725-3729.

Ali, W., & Ahmed, A. A. (2019). Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Information Security, 13(6), 659-669.

Moh'd Iqbal, A. L., Hadi, W. E., & Alwedyan, J. (2013). Detecting Phishing Websites Using Associative Classification. Journal of Information Engineering and Applications, VoI, 3.

Vrban?i?, G., Fister Jr, I., & Podgorelec, V. (2018, June). Swarm intelligence approaches for parameter setting of deep learning neural network: case study on phishing websites classification. In Proceedings of the 8th international conference on web intelligence, mining and semantics (pp. 1-8).

Aydin, M., & Baykal, N. (2015, September). Feature extraction and classification phishing websites based on URL. In 2015 IEEE Conference on Communications and Network Security (CNS) (pp. 769-770). IEEE.

Alqahtani, M. (2019, April). Phishing websites classification using association classification (PWCAC). In 2019 International conference on computer and information sciences (ICCIS) (pp. 1-6). IEEE.

Vaithiyanathan, V., Rajeswari, K., Tajane, K., & Pitale, R. (2013). Comparison of different classification techniques using different datasets. International Journal of Advances in Engineering & Technology, 6(2), 764.

Pandey, M. K., Singh, M. K., Pal, S., & Tiwari, B. B. (2023). Prediction of phishing websites using machine learning. Spatial Information Research, 31(2), 157-166.

Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007, October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69).

Dedakia, M., & Mistry, K. (2015). Phishing detection using content based associative classification data mining. J. Eng. Comput. Appl. Sci, 4(7), 209-214.

Wedyan, S., & Wedyan, F. (2013). An Associative Classification Data Mining Approach for Detecting Phishing Websites. Journal of Emerging Trends in Computing and Information Sciences, 4(12).

Abikoye, O. C., Haruna, A. D., Abubakar, A., Akande, N. O., & Asani, E. O. (2019). Modified advanced encryption standard algorithm for information security. Symmetry, 11(12), 1484.

Nguyen, L. A. T., & Nguyen, H. K. (2015, May). Developing an efficient fuzzy model for phishing identification. In 2015 10th Asian Control Conference (ASCC) (pp. 1-6). IEEE.

Rahman, S. S. M. M., Rafiq, F. B., Toma, T. R., Hossain, S. S., & Biplob, K. B. B. (2020). Performance assessment of multiple machine learning classifiers for detecting the phishing URLs. In Data Engineering and Communication Technology: ICDECT 2019 (L. B. Das, S. Mukhopadhyay, & V. K. Singh, Eds.) (pp. 285-296). Springer.

Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010, April). Predicting phishing websites using classification mining techniques with experimental case studies. In 2010 Seventh International Conference on Information Technology: New Generations (pp. 176-181). IEEE.

Law, E., & Ahn, L. V. (2011). Human computation. Synthesis lectures on artificial intelligence and machine learning, 5(3), 1-121.

Chaurasia, V., & Pal, S. (2020). Machine learning algorithms using binary classification and multi model ensemble techniques for skin diseases prediction. International Journal of Biomedical Engineering and Technology, 34(1), 57-74.

Livieris, I. E., Pintelas, E., Stavroyiannis, S., & Pintelas, P. (2020). Ensemble deep learning models for forecasting cryptocurrency time-series. Algorithms, 13(5), 121.

Ahmadi, A., Nabipour, M., Mohammadi-Ivatloo, B., Amani, A. M., Rho, S., & Piran, M. J. (2020). Long-term wind power forecasting using tree-based learning algorithms. IEEE Access, 8, 151511-151522.

Chen, C. H., Tanaka, K., Kotera, M., & Funatsu, K. (2020). Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications. Journal of cheminformatics, 12(1), 1-16.

Sneha, N., & Gangil, T. (2019). Analysis of diabetes mellitus for early prediction using optimal features selection. Journal of Big data, 6(1), 1-19.

Alejandro Garcia, Machine Learning for Customer Segmentation and Targeted Marketing , Machine Learning Applications Conference Proceedings, Vol 3 2023.

Wang, Y. X., Girshick, R., Hebert, M., & Hariharan, B. (2018). Low-shot learning from imaginary data. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7278-7286).

Chaurasia, V., Pandey, M. K., & Pal, S. (2022). Chronic kidney disease: a prediction and comparison of ensemble and basic classifiers performance. Human-Intelligent Systems Integration, 1-10.

Chen, K. Y., Marschall, E. A., Sovic, M. G., Fries, A. C., Gibbs, H. L., & Ludsin, S. A. (2018). assign POP: An r package for population assignment using genetic, non?genetic, or integrated data in a machine?learning framework. Methods in Ecology and Evolution, 9(2), 439-446.

Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), 443-458.

Adil, M., Javaid, N., Qasim, U., Ullah, I., Shafiq, M., & Choi, J. G. (2020). LSTM and bat-based RUSBoost approach for electricity theft detection. Applied Sciences, 10(12), 4378.