An Adaptive Feature Centric XG Boost Ensemble Classifier Model for Improved Malware Detection and Classification

Main Article Content

J. Pavithra
S. Selvakumara Samy

Abstract

Machine learning (ML) is often used to solve the problem of malware detection and classification and various machine learning approaches are adapted to the problem of malware classification; still  acquiring poor performance by the way of feature selection, and classification. To manage the issue, an efficient Adaptive Feature Centric XG Boost Ensemble Learner Classifier “AFC-XG Boost” novel algorithm is presented in this paper. The proposed model has been designed to handle varying data sets of malware detection obtained from Kaggle data set. The model turns the process of XG Boost classifier in several stages to optimize the performance. At preprocessing stage, the data set given has been noise removed, normalized and tamper removed using Feature Base Optimizer “FBO” algorithm. The FBO would normalize the data points as well as performs noise removal according to the feature values and their base information. Similarly, the performance of standard XG Boost has been optimized by adapting Feature selection using Class Based Principle Component Analysis “CBPCA” algorithm, which performs feature selection according to the fitness of any feature for different classes. Based on the selected features, the method generates regression tree for each feature considered. Based on the generated trees, the method performs classification by computing Tree Level Ensemble Similarity “TLES” and Class Level Ensemble Similarity “CLES”. Using both method computes the value of Class Match Similarity “CMS” based on which the malware has been classified. The proposed approach achieves 97% accuracy in malware detection and classification with the less time complexity of 34 seconds for 75000 samples

Article Details

How to Cite
Pavithra, J. ., & Samy, S. S. . (2022). An Adaptive Feature Centric XG Boost Ensemble Classifier Model for Improved Malware Detection and Classification. International Journal on Recent and Innovation Trends in Computing and Communication, 10(2s), 208–217. https://doi.org/10.17762/ijritcc.v10i2s.5930
Section
Articles

References

S. Choudhary and A. Sharma, "Malware detection and classification using machine learning," in International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), 2020, pp. 1-4.

C. Chen, C. Su, K. Lee and P. Bair, "Malware family classification using active learning by learning," in 22nd International Conference on Advanced Communication Technology (ICACT), 2020, pp. 590-595.

A.Fatima, R.Maurya, M.K.Dutta, R.Burget, and J. Masek, “Android malware detection using genetic algorithm based optimized feature selection and machine learning”, in 42nd International conference on telecommunications and signal processing (TSP) , 2019, pp. 220-223.

K. Sethi, R. Kumar, L. Sethi, P. Bera and P. K. Patra, "A novel machine learning based malware detection and classification framework," in International Conference on Cyber Security and Protection of Digital Services (Cyber Security), 2019, pp. 1-4.

J. Zhang, "CLEMENT: Machine learning methods for malware recognition based on semantic behaviours," in International Conference on Computer Information and Big Data Applications (CIBDA), 2020, pp. 233-236.

N. Udayakumar, V. J. Saglani, A. V. Cupta and T. Subbulakshmi, "Malware classification using machine learning algorithms," in 2nd International Conference on Trends in Electronics and Informatics (ICOEI), 2018, pp. 1-9.

N. Tarar, S. Sharma and C. R. Krishna, "Analysis and classification of android malware using machine learning algorithms," in 3rd International Conference on Inventive Computation Technologies (ICICT), 2018, pp. 738-743.

S. Karthick, D. Malathi, and C. Arun, “Weather prediction analysis using random forest algorithm,” Int J Pure Appl Math, Vol.118, No. 20, pp.255-262, 2018.

A. Irshad, R. Maurya, M. K. Dutta, R. Burget and V. Uher, "Feature optimization for run time analysis of malware in windows operating system using machine learning approach," in 42nd International Conference on Telecommunications and Signal Processing (TSP), pp. 255-260, 2019.

H. Naeem, F. Ullah, M.R. Naeem, S. Khalid, D. Vasan et al, “Malware detection in industrial internet of things based on hybrid image visualization and deep learning model,” Ad Hoc Networks, Vol. 105, pp.102154, 2020.

D. Susanto, M. A. S. Stiawan, M. Arifin, Y. Idris and R. Budiarto, "IoT botnet malware classification using weka tool and scikit-learn machine learning," in 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI), 2020, pp. 15-20.

A.Walker and S. Sengupta, "Insights into malware detection via behavioral frequency analysis using machine learning," in MILCOM IEEE Military Communications Conference (MILCOM), 2019, pp. 1-6.

N. Chawla, H. Kumar and S. Mukhopadhyay, "Machine learning in wavelet domain for electromagnetic emission based malware analysis," IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3426-3441, 2021.

D. Xue, J. Li, T. Lv, W. Wu and J. Wang, "Malware classification using probability scoring and machine learning," IEEE Access, vol. 7, pp. 91641-91656, 2019.

J. Li, L. Sun, Q. Yan, Z. Li, W. Srisa-an et al, "Significant permission identification for machine-learning-based android malware detection," IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3216-3225, 2018.

S. S Samy, V. Sivakumar, T.Sood, and Y.S. Negi “Intelligent Web-History Based on a Hybrid Clustering Algorithm for Future-Internet Systems,” Artificial Intelligence and Evolutionary Computations in Engineering Systems, pp. 571-581, 2020.

M. Khoda, T. Imam, J. Kamruzzaman, I. Gondal and A. Rahman, "Robust malware defense in industrial IoT applications using machine learning with selective adversarial samples," IEEE Transactions on Industry Applications, vol. 56, no. 4, pp. 4415-4424, 2020.

A.Pastor, A. Mozo, S. Vakaruk, D. Canavese, D.R.López et al, “Detection of encrypted cryptomining malware connections with machine and deep learning," IEEE Access, vol. 8, pp. 158036-158055, 2020.

R. Kumar, K. Sethi, N. Prajapati, R. R. Rout and P. Bera, "Machine learning based malware detection in cloud environment using clustering approach," in 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2020, pp. 1-7.

B. A. Ouahab, L. Elaachak, Y. A. Alluhaidan and M. Bouhorma, "A new approach to detect next generation of malware based on machine learning," in International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 2021, pp. 230-235.

W.N.H.Ibrahim, S. Anuar, A.Selamat, O. Krejcar, R.G. Crespo et al, "Multilayer framework for botnet detection using machine learning algorithms," IEEE Access, vol.9, pp. 48753-48768, 2021.

M. Panda, A. A. A. Mousa and A. E. Hassanien, "Developing an efficient feature engineering and machine learning model for detecting IoT-botnet cyber attacks," IEEE Access, vol.9, pp. 91038-91052, 2021.

R. R. Karn, P. Kudva, H. Huang, S. Suneja and I. M. Elfadel, "Cryptomining detection in container clouds using system calls and explainable machine learning," IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 3, pp. 674-691, 2021.

F. Khan, C. Ncube, L. K. Ramasamy, S. Kadry and Y. Nam, "A digital DNA sequencing engine for ransomware detection using machine learning," IEEE Access, vol. 8, pp. 119710-119719, 2020.

B. Baek, S. Euh, D. Baek, D. Kim and D. Hwang, "Histogram entropy representation and prototype based machine learning approach for malware family classification," IEEE Access, vol. 9, pp. 152098-152114, 2021.

K. Vijayan, G. Ramprabu, S.S. Samy, and M. Rajeswari, “Cascading model in underwater wireless sensors using routing policy for state transitions,” Microprocessors and Microsystems, Vol. 79, pp.103298, 2020.

Y. Li, K. Xiong, T. Chin and C. Hu, "A machine learning framework for domain generation algorithm-based malware detection," IEEE Access, vol. 7, pp. 32765-32782, 2019.

M. Z. Osman, A. F. Z. Abidin, R. N. Romli and M. F. Darmawan, "Pixel-based feature for android malware family classification using machine learning algorithms," in International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), 2021, pp. 552-555.

Feizollah, Ali, N. B Anuar, R. Salleh, and A. W. A. Wahab, "A review on feature selection in mobile malware detection." Digital investigation, Vol. 13, pp.22-37, 2015.

A.seidin, Mohammad, M. Alzubi, S. Kovacs, and M. Alkasassbeh. "Evaluation of machine learning algorithms for intrusion detection system." In IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), 2017, pp. 000277-000282.

Altaher, Altyeb, and O. M. Barukab. "Intelligent hybrid approach for android malware detection based on permissions and API calls," International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, pp. 60-67, 2017.

Sachdeva, Shefali, R. Jolivot, and W. Choensawat. "Android malware classification based on mobile security framework." IAENG International Journal of Computer Science, Vol. 45, No. 4, pp. 514-522, 2018.

Alazab, Manoun, R. Layton, S. Venkataraman, and P. Watters, "Malware detection based on structural and behavioral features of API calls." School of Computer and Information Science, Security Research Centre, 2010.

Elish, O. Karim, X. Shu, D. D. Yao, G. Barbara, Ryder, and X. Jiang, "Profiling user-trigger dependence for Android malware detection," Computers & Security, Vol. 49, pp.255-273, 2014.

Kumar, Rajesh, “Malware classification using XGboost-gradient boosted decision tree,” Advances in Science Technology and Engineering Systems Journal. Vol. 5, pp. 536-549, 2020.