Data-driven based Optimal Feature Selection Algorithm using Ensemble Techniques for Classification

Main Article Content

Jayshree Ghorpade
Balwant Sonkamble

Abstract

The shift in paradigm with advanced Machine Learning algorithms will help to face the challenges such as computational power, training time, and algorithmic stability. The individual feature selection techniques, hardly give the appropriate feature subsets, that might be vulnerable to the variations induced at the input data and thus led to wrong conclusions. An expedient technique should be designed for approximating the feature relevance to improve the performance for the data. Unlike the prevailing techniques, the novelty of the proposed Data-driven based Optimal Feature Selection (DOFS) algorithm is the optimal k-value ‘kf’ determined by the data for effective feature selection that minimizes the computational complexity and expands the prediction power using the gradient descent method. The experimental analysis of proposed algorithm is demonstarted with ensemble techniques for the non-communicable disease such as diabetes mellitus dataset produces an accuracy of 80.80%, whereas comparative performance analysis for benchmark dataset depicts the improved accuracy of 86.03%.

Article Details

How to Cite
Ghorpade, J. ., & Sonkamble, B. . (2023). Data-driven based Optimal Feature Selection Algorithm using Ensemble Techniques for Classification. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4), 33–41. https://doi.org/10.17762/ijritcc.v11i4.6378
Section
Articles

References

Ghorpade-Aher J., Sonkamble B., 2021, "Effective Feature Selection Using Ensemble Techniques and Genetic Algorithm," International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems, vol 236. Springer, Singapore, doi.org/10.1007/978-981-16-2380-6_32.

* World Health Organization, Article in English | Scopus|ID:covidwho-1460284, Aug’2022, https://pesquisa.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/resource/pt /covidwho-1460284?lang=en

W. Ding, C. Lin and W. Pedrycz, 2020, “Multiple Relevant Feature Ensemble Selection Based on Multilayer Co-Evolutionary Consensus MapReduce," IEEE Transactions on Cybernetics, vol. 50, no. 2, pp. 425-439.

J. Ghorpade and B. Sonkamble, 2020, “Predictive Analysis of Heterogeneous Data Techniques & Tools," 5th International Conference on Computer and Communication Systems, Shanghai, China, pp. 40-44, World Cat: OCLC Number- 1175635436, IEEE Press, Piscataway, NJ, US.

X. Yu and Q. Wu, 2021, "Multi-source Heterogeneous Data Association Technology to Build Public Safety Big Data Integration Research," International Conference on Big Data Economy and Information Management (BDEIM), doi: 10.1109/BDEIM52318.2020.00012, pp. 17-20.

M. Bader-El-Den, E. Teitei and T. Perry, 2019, "Biased Random Forest For Dealing With the Class Imbalance Problem," in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 7, pp. 2163-2172, doi: 10.1109/TNNLS.2018.2878400.

M. Z. Jan, J. C. Munoz and M. A. Ali, 2020,"A novel method for creating an optimized ensemble classifier by introducing cluster size reduction and diversity," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2020.3025173.

X. Wang, L. Yan and Q. Zhang, 2021, "Research on the Application of Gradient Descent Algorithm in Machine Learning," 2021 International Conference on Computer Network, Electronic and Automation (ICCNEA), pp. 11-15, doi: 10.1109/ICCNEA53019.2021.00014.

Panagiotis Pintelas and Ioannis E. Livieris, 2020, "Special Issue on Ensemble Learning and Applications”, MDPI Scopus journal, Algorithms’20, 13, 140; doi:10.3390/a13060140, pp.1-4.

A. Chatzimparmpas, R. M. Martins, K. Kucher and A. Kerren, 2021, "StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics," IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1547-1557.

Omar Y.Al-Jarrah, Paul D.Yoob, Sami Muhaidat, George K. Karagiannidis, KamalTaha, 2015, "Efficient Machine Learning for Big Data: A Review," Elsevier, Science Direct, Big Data Research, vol. 2, pp. 87-93.

Barbara Pes., 2020, "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains, " Neural Computing & Applications, vol. 32, pp.5951–5973, https://doi.org/10.1007/s00521-019-04082-3.

Z. Yu et al., 2019, “Adaptive Semi-Supervised Classifier Ensemble for High Dimensional Data Classification," IEEE Transactions on Cybernetics, vol. 49, no. 2, pp. 366-379.

M. G. Rojas, A. C. Olivera, J. A. Carballido and P. J. Vidal, Nov’2020, "A Memetic Cellular Genetic Algorithm for Cancer Data Microarray Feature Selection," IEEE Latin America Transactions, vol. 18, no. 11, pp.1874-1883, doi: 10.1109/TLA.2020.9398628.

Hassanat A, Almohammadi K, Alkafaween E, Abunawas E, Hammouri A, Prasath VBS, 2019, “Choosing Mutation and Crossover Ratios for Genetic Algorithms—A Review with a New Dynamic Approach. Information,” vol.10, no.12:390, pp. 1-36, https://doi.org/10.3390/info10120390

Swati Swayamsiddha, 2020, “Bio-inspired algorithms: principles, implementation, and applications to wireless communication,” Nature-Inspired Computation and Swarm Intelligence, pp. 49-63, Algorithms, Theory and Applications, ISBN 9780128197141

D. B. Rawat, R. Doku and M. Garuba, Dec’2021, "Cybersecurity in Big Data Era: From Securing Big Data to Data-Driven Security," in IEEE Transactions on Services Computing, vol. 14, no. 6, 1, pp.2055-2072, doi: 10.1109/TSC.2019.2907247.

Jie Wang, Jing Xu, Chengan Zhao, Yan Peng Hongpeng Wang, 2019, “An ensemble feature selection method for high-dimensional data based on sort aggregation', Systems Science Control Engineering, IEEE Access, vol. 7, no.2, pp. 32-39.

R. d. O. Nunes, C. A. Dantas, A. M. P. Canuto and J. C. Xavier-Junior, 2016, "An unsupervised-based dynamic feature selection for classification tasks," IEEE, International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, pp.4213-4220.

Feng Xia, Wei Wang, TeshomeMegersa Bekele, Huan Liu, 2017, "Big Scholarly Data: A Survey," IEEE transactions on BigData, vol. 3, no. 1, pp. 18-35.

Ali M, Ali SI, Kim D, Hur T, Bang J, Lee S, Kang BH, Hussain M, 2018, “uEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features,' PLoS One, 2018 Aug 28,13(8):e0202705, PMID: 30153294; PMCID: PMC6112679.

Lidong Wang, Randy Jones, 2017, "Big Data Analytics for Disparate Data," American Journal of Intelligent Systems, vol. 7, no. 2, pp. 39-46.

Malhotra, Ruchika & Sharma, Anjali, 2021, “Threshold benchmarking for feature ranking techniques,” Bulletin of Electrical Engineering and Informatics, vol.10. no.2, pp.1063-1070, doi:10.11591/eei.v10i2.2752.

B.Seijo-PardoI. Porto-DiazV. Bol on-Canedo A. Alonso-Betanzos, 2017, “Ensemble feature selection: Homogeneous and heterogeneous approaches," Elsevier, Knowledge-Based Systems, vol.118, pp. 124-139.

Pimentel A, Carreiro AV, Ribeiro RT, Gamboa H., Jun’2018, "Screening diabetes mellitus 2 based on electronic health records using temporal features," Health Informatics Journal, vol.24, issue.2, doi: 10.1177/1460458216663023, PMID: 27566751, pp.194-205

Alkundi A and Momoh R., 2020, “COVID-19 infection and diabetes mellitus, "Journal of Diabetes, Metabolic Disorder Control, DOI: 10.15406/jdmdc.2020.07.00212, vol. 7, no.4, pp.119-120.

UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets/connectionist+bench (sonar,+mines+vs.+rocks)

Zhang, Zili & Yang, Pengyi, 2019, "An Ensemble of Classifiers with Genetic Algorithm Based Feature Selection," IEEE Intelligent Informatics Bulletin, Vol-9, pp. 18-24.

Jayshree Ghorpade-Aher, Balwant Sonkamble, Dec’2022, "A Machine Learning Algorithm for Multi-Source Heterogeneous Data with Block-Wise Missing Information", IJCSE, Engg Journals Publications, ISSN: 0976-5166, Vol. 13, No. 6, pp.1893-1904.

Sharma, N., Dev, J., Mangla, M. et al., 2021, "A Heterogeneous Ensemble Forecasting Model for Disease Prediction," Springer, New Gener. Computing., pp.1-15.

Y. Zhao and R. Duangsoithong, 2020, “Empirical Analysis using Feature Selection and Bootstrap Data for Small Sample Size Problems," IEEE 16th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, Chonburi, Thailand, pp. 814-817.